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(57) Abstract: EKG sensors 
((150) are placed on a patient 
(140) to receive electrocardiogram 
(EKG) recording signals, which are 
typically combinations of original 
signals from different sources, such 
as pacemaker signals, QRS complex 
signals, and irregular oscillatory 
signals that suggest an arrhythmia 
condition. A computing module 
(120) uses independent component 
analysis to separate the recorded 
EKG signals. The separated signals 
are displayed to help physicians 
to analyze heart conditions and 
to identify probably locations of 
abnormal heart conditions. At least 
a portion of the separated signals 
can be further displayed in a chaos 
phase space portrait to help detect 
abnormality in heart conditions. 
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SYSTEM AND METHOD FOR SEPARATING CARDIAC SIGNALS 
Pa-ckgrowK* of the Invention 

Field of the Invention 

The present invention relates to medical devices for recording cardiac signals and 

5 separating the recorded cardiac signals. 
Description of the ftglated Art 

Electrocardiogram (EKG) recording is a valuable tool for physicians to study patient heart 
conditions. In a typical 12-lead arrangement, up to 12 sensors are placed on a subject's chest or 
abdomen and limbs to record the electric signals from the beating heart. Each sensor, along with a 

10 reference electrode, form a separate channel that produces an individual signal. The signals from 
the different sensors are recorded on an EKG machine as different channels. The sensors are 
usually unipolar or bipolar electrodes or other devices suitable for measuring the electrical potential 
on the surface of a human body. Since different parts of the heart, such as the atria and ventricles, 
produce different spatial and temporal patterns of electrical activity on the body surface, the signals 

15 recorded on the EKG machine are useful for analyzing how well individual parts of the heart are 
functioning. 

A typical heartbeat signal has several well-characterized components. The first component 
is a small hump in the beginning of a heartbeat called the "P-Wave". This signal is produced by the 
right and left atria. There is a flat area after the P-Wave which is part of what is called the PR 

20 Interval. During the PR interval the electrical signal is traveling through the atrio-ventricular node 
(AV) node. The next large spike in the heartbeat signal is called the M QRS Complex." The QRS 
Complex is tall, spikey signal produced by the ventricles. Following the QRS complex is another 
smaller bump in the signal called the "T-Wave," which represents the electrical resetting of the 
ventricles in preparation for the next signal. When the heart beats continuously, the P-QRS-T waves 

25 repeat over and over. 

Many publications have described studying cardiac signals and detecting abnormal heart 
conditions. Sample publications include U.S. Patent Publication No. 20020052557; Podrid & 
Kowey, Cardiac Arrhythmia: Mechanisms. Diagnosis, and Management Lippincott Williams & 
Wilkins Publishers (2nd edition, August 15, 2001); Marriott & Conover, Advanced Concepts in 

30 Arrhythmias . Mosby Inc. (3nd edition, January 15, 1998); and Josephson, M.E., Clinical Cardiac 
Electrophvsiologv: Techniques and Interpretations . Lippincott Williams & Wilkins Publishers; 
ISBN (3rd edition, December 15, 2001). 

Unfortunately, although EKG signals have been studied for decades, they are difficult to 
assess because EKG signals recorded at the surface are mixtures of signals from multiple sources. 

35 Typically, it is relatively straightforward to measure the shape of the QRS complex since this signal 
is so strong. However, irregular shaped P-wave or T-wave signals, along with weak irregular 
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oscillatory signals that suggest a heart arrhythmia are often masked by large pacemaker signals, or 
the strong QRS complex signals. Thus, it can be very difficult to isolate small irregular oscillatory 
signals and to identify arrhythmia conditions. 

In addition, atrial and ventricular signals are sometimes undesirably superimposed over one 
5 another. In many cases, diagnosis of disease states requires these signals to be separated from one 
another. For example, it might be desirable to separate P wave signals from QRS complex signals, 
so that signals originating in an atrium are isolated from signals representing concurrent activities in 
the ventricle. 

In some practices the EKG signals are electronically "filtered" by excluding signals of 

10 certain frequencies. The signals are also "averaged" to remove largely random or asynchronous 
data, which is assumed to the meaningless "noise." The filtering and averaging methods 
irreversibly eliminate portions of the recorded signals. In addition, it is not proven whether the 
more random data is truly "noise" and truly meaningless. It might be that the signals that are 
removed are indicative of a disease state in a patient. Another method as disclosed in U.S. Patent 

15 No. 6,308,094 entitled "System for prediction of cardiac arrhythmias" uses Karhunen Loeve 
Transformation to decompose or compress cardiac signals into elements that are deemed 
"significant." As a result the information that are deemed "insignificant" are lost. 

Compared to other signal separation applications, separating EKG recording signals 
presents additional challenges. For example, the sources are not always stationary since the heart 

20 chambers contract and expand during beating. Additionally, the activity of a single chamber may 
be mistaken for multiple sources because of the presence of moving waves of electrical activity 
across the heart. If electrodes are not securely attached to the patient, or if the patient moves (for 
example older patients may suffer from uncontrolled jittering), the movement of the electrodes also 
undesirably generates signals. In addition, multiple signals can be sensed by the EKG which are 

25 unrelated to the cardiac signature, such as myopotentials, i.e., electrical signals from muscles other 
than the heart. 

There has been disclosure of cardiac rhythm management systems that store of list of 
triggers. U.S. Patent No. 6,400,982 entitled "Cardiac rhythm management system with arrhythmia 
prediction and prevention" discloses such a system. If a trigger matches detected cardiac signals 
30 from a patient, the system calculates the probability of arrhythmia and activates a prevention 
therapy to the patient. However the cardiac signals are in fact mixtures of signals from multiple 
sources, and the signals that are important for arrhythmia detection can be masked by other signals. 
It is therefore desirable to separate the cardiac signals used in the cardiac rhythm management 
systems. 

35 Independent component analysis (ICA) is a technique for separating mixed source signals 

(components) which are presumably independent from each other. In its simplified form, 
independent component analysis operates a "un-mixing" matrix of weights on the mixed signals, for 
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example multiplying the matrix with the mixed signals, to produce separated signals. The weights 
are assigned initial values, and then adjusted to minimize information redundancy in the separated 
signals. Because this technique does not require information on the source of each signal, it is 
known as a "blind source separation" method. Blind separation problems refer to the idea of 
5 separating mixed signals that come from multiple independent sources. Although there are many 
ICA techniques currently known, most have evolved from the original work described in U.S. 
Patent No. 5,706,402 issued on January 6, 1998. Additional references of ICA and blind source 
separation can be found in, for example, A. J. Bell and TJ Sejnowski, Neural Computation 7:1 129- 
1159 (1995)); Te-Won Lee, Independent Component Analysis: Theory and Applications . Kluwer 

10 Academic Publishers, Boston, September 1998, Hyvarinen et al., Independent Component Analysis . 
1st edition (Wiley-Interscience, May 18, 2001); Mark Girolami, Self-Organiz ing Neural Networks: 
Independent Component Analysis and Blind Source Separation (Perspectives in Neural Computing) 
(Springer Verlag, September 1999); and Mark Girolami (Editor), Advances in Independent 
Component Analysis (Perspectives in Neural Computing) (Springer Verlag August 2000). Single 

15 value decomposition algorithms have been disclosed in Adaptive Filter Theory by Simon Haykin 
(Third Edition, Prentice-Hall (NJ), (1996). 

There has been suggestion to use chaos theory to analyze cardiac signals to detect abnormal 
heart conditions. Sample disclosures include U.S. Patent Nos. 5,439,004, 5,342,401, 5,447,520 and 
5,456,690; PCT application Nos. WO02/34123 and WO0224276; Smith et al. Electrical Alternans 

20 and Cardiac Electrical Instability. Circulation, Vol. 77, No. I, pp. 110-121 (January 1988). Other 
approaches are disclosed in U.S. Patent No. 5,447,520 issued to Spano, et al. and U.S. Patent No. 
5,201,321 issued to Fulton. Chaos theory is defined as the study of complex nonlinear dynamic 
systems. Complex implies just that, nonlinear implies recursion and higher mathematical 
algorithms, and dynamic implies non-constant and non-periodic. Thus chaos theory is, very 

25 generally, the study of changing complex systems based on mathematical concepts of recursion, 
whether in the form of a recursive process or a set of differential equations modeling a physical 
system. 

When a bounded chaotic system has some kind of long-term pattern, but the pattern is not a 
simple periodic oscillation or orbit, then the system has a "Strange Attractor". If the system's 

30 behavior is plotted in a graph over an extended period patterns can be discovered that are not 
obvious in the short term. In addition, in these types of systems, no matter what the initial 
conditions are, usually the same pattern is found to emerge. The area for which this recurring 
pattern holds true is called the "basin of attraction" for the attractor. Chaos theory methods have 
been described in, for example, N. H. Packard, J. P. Crutchfield, J. Doyne Farmer, and R. S. Shaw, 

35 Geometry of a Time Series . Physical Review Letters, 47 (1980), p. 712; F. Takens, Detecting 
Strange Attractors in Turbulence in Lecture Notes in Mathematics 898, D. A. Rand and L. S. 
Young, eds., (Berlin: Springer- Verlag, 1981), p. 336; and J. P. Crutchfield, J. Doyne Farmer, N. H. 
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Packard, and R. S. Shaw, On Determining the Dimension of Chaotic Flows. Physica 3D, (1981), 
pp. 605-17. 

For all of these reasons, what is needed in the art is a system that can accurately separate 
medical signals from one another in order to diagnose disease states. 

5 Summary of the Invention 

The present application discloses systems and methods for using independent component 
analysis to determine the existence and location of anomalies such as arrhythmias of a heart. The 
disclosed systems and methods can be applied to suggest the location of atrial fibrillation, and to 
locate arrhythmogenic regions of a chamber of the heart using heart cycle signals measured from a 

10 body surface of the patient. Non-invasive localization of the ectopic origin allows focal treatment to 
be quickly targeted to effectively inhibit these complex arrhythmias without having to rely on 
widespread and time consuming sequential searches or on massively invasive simultaneous 
intracardiac sensor technique. The effective localization of these complex arrhythmias can be 
significantly enhanced by using independent component analysis to separate superimposed heart 

15 cycle signals originating from differing chambers or regions of the heart tissue. In addition, the 
signals that are separated by ICA are preferably also analyzed by plotting them on a chaos phase 
space portrait. 

One aspect of the invention relates to a medical system for separating cardiac signals. This 
aspect includes a receiving module to receive recorded cardiac signals from medical sensors, a 

20 computing module to separate the received signals using independent component analysis to 
produce separated signals, and a display module to display the separated signals. 

Another aspect of the invention relates to a method of detecting arrhythmia in a patient 
The method includes placing EKG sensors on a patient to produce recorded EKG signals, sending 
the recorded signals to a computing module to separate the recorded signals into separated signals 

25 using independent component analysis, and reviewing a display of the separated signals to 
determine the existence of arrhythmia in the patient. In a preferred embodiment, each component 
of separated signals corresponds to a channel of recorded signals and its sensor location, therefore 
when the one or more components of separated signals that suggest arrhythmia are detected, the 
corresponding one or more sensor locations also suggest the location of arrhythmia. 

30 Yet another aspect of the invention relates to a cardiac rhythm management system. The 

system includes a cardiac signal recording module to record cardiac signals of a patient, a 
computing module to separate the recorded signals into separated signals using independent 
component analysis, and a detection module to detect or to predict an abnormal condition based on 
analyzing the separated signals. The system also includes a treatment module to treat the patient or 

35 a warning module to issue a warning when the abnormal condition is detected or predicted. 

Other aspects and embodiments of the invention are described below in the detailed 
description section or defined by the claims. 
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Brief Description of the Drawings 
FIGURE 1 is a diagram of a EKG system according to one embodiment of the invention. 
FIGURE 2 is a flowchart illustrating one embodiment of a process for separating cardiac 

signals. 

5 FIGURE 3 A is a sample chart of recorded EKG signals. 

FIGURE 3B is a sample chart of separated EKG signals. 

FIGURE 3C is a sample chart of one component of separated signals back projected on the 
recorded signals. 

FIGURE 4A is a chaos phase space portrait of three components of separated EKG signals 
10 of a healthy subject. 

FIGURE 4B is a chaos phase space portrait of three components of separated EKG signals 
of a subject with an abnormal heart condition. 

Detailed Description of the Preferred Embodiment 
Embodiments of the invention relate to a system and method for accurately separating 
15 medical signals in order to determine disease states in a patient. In one embodiment, the system 
analyzes EKG signals in order to determine whether a patient has a heart ailment or irregularity. As 
discussed in detail below, embodiments of the system utilize the techniques of independent 
component analysis to separate the medical signals from one another. 

In addition to the signal separation technique, embodiments of the invention also relate to 
20 systems and methods that first separate signals using ICA, and then perform an analysis on a 
specific isolated signal, or set of isolated signals, using a "chaos" analysis. As described earlier, 
Chaos theory (also called nonlinear dynamics) studies patterns that are not completely random, but 
cannot be determined by simple formulas. Because cardiac signals are typically non-random, but 
cannot be easily described by a simple formula, Chaos theory analysis as described below provides 
25 an effective tool to analyze these signals and determine disease states. 

Accordingly, once the signals are separated using ICA, they can be plotted to produce a 
chaos phase space portrait. By reviewing the patterns in the phase space portrait, for example 
reviewing the existence and location of one or more attractors, or comparing established health 
patterns and established abnormal patterns with the patterns of the patient, a user is able to assess 
30 the likelihood of abnormality in the signals, which indicate disease conditions in the patient. 

FIGURE 1 is a diagram of an EKG system that includes a computing module for signal 
separation according to one embodiment of the present invention. As shown in FIGURE 1, 
electrode sensors 150 are placed on the chest and limb of a patient 140 to record electric signals. 
The electrodes send the recorded signals to a receiving module 110 of the EKG system 100. After 
35 optionally performing signal amplification, analog-to-digital conversion or both, the receiving 
module 110 sends the received signals to a computing module 120 of the EKG system 100. The 
computing module 120 uses an independent component analysis method to separate the recorded 
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signals to produce separated signals. The independent component analysis method has been 
described in detail in the Appendix and below with respect to Figure 2. 

The computing module 120 can be implemented in hardware, software, or a combination of 
both. It can be located physically within the EKG system 100 or connected to the recorded signals 
5 received by the EKG system 100. A displaying module 130, which includes a printer or a monitor, 
displays the separated signals on paper or on screen. The displaying module 130 can be located 
within the EKG system 100 or connected to it. Optionally, the displaying module 130 also displays 
the recorded signals on paper or on screen. In one embodiment, the displaying module also 
displays some components of the separated signals in a chaos phase space portrait. 

10 In one embodiment, the EKG system 100 also includes a database (not shown) that stores 

recognized EKG signal triggers and corresponding diagnosis. The triggers refer to conditions that 
indicate the likelihood of arrhythmia. For example, triggers can include sinus beats, premature 
sinus beats, beats following long sinus pauses, long-short beat sequences, R on T-wave beats, 
ectopic ventricular beats, premature ventricular beats, and so forth. Triggers can include threshold 

15 values that indicate arrhythmia, such as threshold values of ST elevations, heart rate, increase or 
decrease in heart rate, late-potentials, abnormal autonomic activity, and so forth. A left bundle- 
branch block diagnosis can be associated with triggers such as the absence of q wave in leads I and 
V6, a QRS duration of more than 120 msec, small notching of R wave, etc. 

Triggers can be based on a patient's history, for example the percentage of abnormal beats 

20 detected during an observation period, the percentage of premature or ectopic beats detected during 
an observation period, heart rate variation during an observation period, and so forth. Triggers may 
also include, for example, the increase or decrease of ST elevation in beat rate, the increase in 
frequency of abnormal or premature beats, and so forth. 

A matching module (not shown) attempts to match the separated signals with one or more 

25 of the stored triggers. If a match is found, the matching module displays the matched 
corresponding diagnosis, or sends a warning to a healthcare worker or to the patient. Methods such 
as computer-implemented logic rules, classification trees, expert system rules, statistical or 
probability analysis, pattern recognition, database queries, artificial intelligence programs and 
others can be used to match the separated signals with stored triggers. 

30 FIGURE 2 is a flowchart illustrating one embodiment of a process for separating EKG 

signals. The process starts from a start block 202, and proceeds to a block 204, where the 
computing module 120 of the EKG system 100 receives the recorded signals Xj from the electrode 
sensors, with J being the number of channels. Prior to processing, the signals can be amplified to 
strengths suitable for computer processing. Analog-to-digital conversion of signals can also be 

35 performed. 

From the block 204, the process proceeds to a block 206, where the initial values for a "un- 
mixing" matrix of scaling weights Wjj are selected. In one embodiment, the initial values for a 
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matrix of initial weights W i0 are also selected. The process then proceeds to a block 208, where a 
plurality of training signals Y* are produced by operating the matrix on the recorded signals. In a 
preferred embodiment, the training signals are produced by multiplying the matrix with the 
recorded signals such that Y,- = Wy * Xj. In one embodiment, the initial weights W i0 are included 
5 such that Y { = Wy * Xj + W i0 . The process proceeds from the block 208 to a block 210, wherein the 
scaling weights Wy and optionally the initial weights W i0 are adjusted to reduce the information 
redundancy among the training signals. Methods of adjusting the weights have been described in 
the Appendix. 

The process proceeds to a decision block 212, where the process determines whether the 

10 information redundancy has been reduced to a satisfactory level. The criteria for the determination 
has been described in the Appendix. If the process determines that information redundancy among 
the training signals has been reduced to a satisfactory level, then the process proceeds to a block 
214, where the training signals are displayed as separated signals Yj, with I being the number of 
components for the separated signals. In a preferred embodiment, I, the number of components of 

15 separated signals, is equal to J, the number of channels of recorded signals. Otherwise the process 
returns from the block 212 to the block 208 to again adjust the weights. From the block 214, the 
process proceeds to an end block 216. 

For the un-mixing matrix W with the final weight values, its rows represent the time 
courses of relative strengths/activity levels (and relative polarities) of the respective separated 

20 components. Its weights give the surface topography of each component, and provide evidence for 
the components* physiological origins. For the inverse of matrix W, its columns represent the 
relative projection strengths (and relative polarities) of the respective separated components onto 
the channels of recorded signals. The back projection of the ith independent component onto the 
recorded signal channels is given by the outer product of the ith row of the separated signals matrix 

25 with the ith column of the inverse un-mixing matrix, and is in the original recorded signals. Thus 
cardiac dynamics or activities of interest accounted for by single or by multiple components can be 
obtained by projecting one or more ICA components back onto the recorded signals, X =W _1 * Y, 
where Y is the matrix of separated signals, Y = W * X. 

The separated signals are determined by the ICA method to be statistically independent and 

30 are presumed to be from independent sources. Regardless of whether there is in fact some 
dependence between the separated EKG signals, test results show that the separated signals provide 
a beneficial perspective for physicians to detect and to locate the abnormal heart conditions of a 
patient. 

In a preferred embodiment, time-delay between source signals is ignored. Since the 
35 sampling frequencies of cardiac signals are in the relatively low 200-500 Hz range, the effect of 
time-delay can be neglected. 
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Improved methods of ICA can be used to speed up the signal separation process. In one 
embodiment, a generalized Gaussian mixture model is used to classify the recorded signals into 
mutually exclusive classes. The classification methods have been disclosed in U.S. Patent 
Application No. 09/418,099 titled "Unsupervised adaptation and classification of multiple classes 

5 and sources in blind source separation" and PCT Application No. WOO 127874 titled "Unsupervised 
adaptation and classification of multi-source data using a generalized Gaussian mixture model." In 
another embodiment, the computing module 120 incorporates a priori knowledge of cardiac 
dynamics, for example supposing separated QRS components to be highly kurkotic and (ar)rythmic 
component(s) to be sub-Gaussian. ICA methods with incorporated a priori knowledge have been 

10 disclosed in T-W. Lee, M. Girolami and TJ. Sejnowski, Independent Component Analysis using an 
Extended Infomax Algorithm for Mixed Sub-Gaussian and Super-Gaussian Sources, Neural 
Computation, 1999, Vol.1 1(2): 417-441. 

FIGURE 3 A illustrates a ten-second portion of 12 channels of signals that were gathered as 
part of an EKG recording. The horizontal axis in FIGURE 3A represents time progression of ten 

15 seconds. The vertical axis represents channel numbers 1 to 12. The signals of FIGURE 3 A are, in 
this case, from a patient that provided a mixture of multiple signals, including QRS complex 
signals, pacemaker signals, multiple oscillatory activity signals, and noise. However, because these 
signals were all occurring simultaneously, they cannot be easily separated from one another using 
conventional EKG equipment. 

20 In contrast, FIGURE 3B illustrates output signals separated from the mixture signals of 

FIGURE 3A, according to one embodiment of the present invention. As above, the horizontal axis 
in FIGURE 3B represents time progression of ten seconds and the vertical axis represents the 
separated components 1 to 12. The separated signals in FIGURE 3B are displayed as components 1 
to 12 corresponding to the channels 1 to 12 in FIGURE 3 A, so that a physician can identify a 

25 separated signal as relating to its respective recorded signal's corresponding sensor location on the 
patient body. For example, in a standard 12-lead arrangement, leads II, m and AvF represent 
signals from the inferior region. Leads VI, V2 represent signals from the septal region. Leads V5, 
V6, 1, and a VL represent signals from the lateral heart. Right and posterior heart regions typically 
require special lead placement for recording. To better identify the location of a heart condition, 

30 more than 12 leads can be used. For example, 20, 30, 40, 50, or even hundreds of sensors can be 
placed on various portions of a patient's torso. Fewer than 12 leads can also be used. The sensors 
are preferably non-invasive sensors located on the patient's body surface, but invasive sensors can 
also be used. With separated signals each corresponding to one of the locations, a physician can 
review the signals and detect abnormalities that correspond to the respective locations. 

35 As shown in FIGURE 3B, the component #1 represents the pacemaker signals and the early 

part of QRS complex signals. The component #2 represents major portions of later parts of the 
QRS complex signals. QRS complex signals represent the depolarization of the left ventricle. The 
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component #10 represents atrial fibrillation (a type of arrhythmia) signals. Therefore atrial 
fibrillation is predicted to be located at the sensor location that corresponds to channel #10. 
Although components #1 and #10 contain similar frequency contents of oscillatory activity between 
heart beats, they capture activities from different spatial locations. 

5 For EKG signals, we discovered that the signals separated using ICA are usually more 

independent from each other and have less information redundancy than signals that have not been 
processed through ICA. Compared to the recorded signals, the separated signals usually better 
represent the signals from the original sources of the patient's heart. In addition to arrhythmia, the 
separated cardiac signals can also be used to help detect other heart conditions. For example, the 

10 separated signals especially the separated QRS complex signals can be used detect premature 
ventricular contraction. The separated signals especially the separated Q wave signals can be used 
to detect myocardial infarction. Separating the EKG signals, especially separating the QRS complex 
and T wave signals, can help distinguish left and right bundle branch block. 

Of course, the disclosed system and method are not limited to detecting arrhythmia, or any 

15 particular type of disease state. Embodiments of the invention include all methods of analyzing 
medical signals using ICA. For example, when a pregnant woman undergoes EKG recording, the 
heart signals from the woman and from the fetus(es) can be separated. 

The separated cardiac signals can be characterized as non-random but not easily 
deterministic, which make them suitable subjects for chaotic analysis. As mentioned above, chaos 

20 theory (also called nonlinear dynamics) studies patterns that are not completely random but cannot 
be determined by simple formulas. The separated signals can be plotted to produce a chaos phase 
space portrait. By reviewing the patterns in the phase space portrait, including the existence and 
location of one or more attractors, a user is able to assess the likelihood of abnormality in the 
signals, which indicate disease conditions in the patient. 

25 In a preferred embodiment, the QRS complex signals are separated into three different 

components, with each component representing a portion of the QRS complex. The 3 components 
are 3 data sets that are found to be temporally statistically independent using independent 
component analysis. Using the three components, a 3-dimensional phase space portrait of QRS 
complex can be displayed to show the trajectory of the three components. 

30 FIGURE 3C is a sample chart of the component #10 of separated signals (as shown in 

FIGURE 3B) back projected onto the recorded signals of FIGURE 3 A. The separated signals of 
component #10, which indicate arrhythmia, is identified by reference number 302 in FIUGRE 3C. 
The 12 channels of recorded signals are identified by reference number 304 for ease of 
identification. FIGURE 3C therefore allows direct visual comparison of a separated component 

35 against channels of recorded signals. The back projections of cardiac dynamics allow us to exam 
the amount of information accounted for by single or by multiple components in the recorded 
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signals and to confirm the components' physiological meanings suggested by the surface 
topography (the aforementioned inverse of columns of the un-mixing matrix). 

FIGURE 4A illustrates the phase space portrait of the EKG recording of a healthy subject. 
FIGURE 4B illustrates the phase space portrait of the EKG recording of an atrial fibrillation patient. 

5 In FIGURES 4A and 4B, the x, y, and z axis represent the amplitudes of the 3 QRS components. 
The separated signals' values over time are plotted to produce the phase space portraits. In the 
healthy EKG recording of FIGURE 4 A, the dense cluster 402 indicates the existence of an attractor 
that attracts the signal values to the region of the dense cluster 402. The dense cluster 402 
represents the most frequent occurrences of the signals. In the atrial fibrillation patient EKG 

10 recording of FIGURE 4B, an additional loop 404, which is not part of the dense cluster 402, is 
below the attractor and the dense cluster 402 and closer to the base plane than the dense cluster 402. 
This additional loop 404 is presumably due to the oscillatory activity in the baseline portions of the 
EKG signals. The separated component #10 signal that indicate an arrhythmia condition is 
presumably responsible for the additional loop 404. The visual pattern can be compared with the 

15 visual pattern of a health subject and manually recognized as probative of indicating an abnormal 
condition such as atrial fibrillation. 

Instead of the 3 QRS complex components as shown in FIGURE 4B, other components or 
more than 3 components can also be used to plot the chaos phase space portrait. If more than 3 
components are used, the different components can be plotted in different colors. The 3 QRS 

20 complex components of FIGURE 4B are selected because test results suggest that such a phase 
space portrait is physiological significant and functions usually well as an indication of a patient's 
heart condition. 

Although FIGURES 3A, 3B, 4A and 4B were produced using test results related to the 
detection and localization of focal atrial fibrillation, the disclosed systems and methods can be used 

25 to detect and to localize other heart conditions including focal and re-entrant arrhythmia. The 
disclosed systems and methods can also be used to detect and to localize paroxysmal atrial 
fibrillation as well as persistent and chronic atrial fibrillation. 

The disclosed methods can be used to improve existing cardioverter/defibrillators (ICD's) 
that can deliver electrical stimuli to the heart. In addition to existing ICD's and existing 

30 pacemakers, some of the existing cardiac rhythm management devices also combine the functions 
of pacemakers and ICD's. A computing module embodying the disclosed methods can be added to 
the existing systems to separate the recorded cardiac signals. The separated signals are then used 
by the cardiac rhythm management systems to detect or to predict abnormal conditions. Upon 
detection or prediction, the cardiac rhythm management system automatically treats the patient, for 

35 example by delivering pharmacologic agents, pacing the heart in a particular mode, delivering 
cardioversion/defibrillation shocks to the heart, or neural stimulation of the sympathetic or 
parasympathetic branches of the autonomic nervous system. Instead of or in addition to automatic 
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treatment, the system can also issue a warning to a physician, a nurse or the patient. The warning 
can be issued in the form of an audio signal, a radio signal, and so forth. The disclosed signal 
separation methods can be used in cardiac rhythm management systems in hospitals, in patient's 
homes or nursing homes, or in ambulances. The cardiac rhythm management systems include 

5 implantable cardioverter defibrillators, pacemakers, biventricular or other multi-site coordination 
devices and other systems for diagnostic EKG processing and analysis. The cardiac rhythm 
management systems also include automatic external defibrillators and other external monitors, 
programmers and recorders. 

In one embodiment, an improved cardiac rhythm management system includes a storage 

10 module that stores the separated signals. In one arrangement, the storage module can be removed 
from the cardiac rhythm management system and connected to a computing device. In another 
arrangement, the storage module is directly connected to a computing device without being 
removed from the cardiac rhythm management system. The computing device can provide further 
analysis of the separated signals, for example displaying a chaos phase space portrait using some of 

15 the separated signals. The computing device can also store the separated signals to provide a 
history of the patient's cardiac signals. 

The disclosed methods can also be applied to predict the occurrence of arrhythmia within a 
patient's heart. After separating recorded EKG signals into separated signals, the separated signals 
can be matched with stored triggers and diagnosis as described above. If the separated signals 

20 match stored triggers that are associated with arrhythmia, an occurrence of arrhythmia is predicted. 
In other embodiments, an arrhythmia probability is then calculated, for example based on how 
closely the separated signals match the stored triggers, based on records of how frequently in the 
past has the patient's separated signals matched the stored triggers, and/or based on how frequently 
in the past the patient has actually suffered arrhythmia. The calculated probability can then be used 

25 to predict when will the next arrhythmia occur for the patient. Based on statistics and clinical data, 
calculated probabilities can be associated with specified time periods within an arrhythmia will 
occur. 

In addition to EKG signals, the disclosed systems and methods can be applied to separate 
other electrical signals such as electroencephalogram signals, electromyographic signals, 

30 electrodermographic signals, and electroneurographic signals. They can be applied to separate 
other types of signals, such as sonic signals, optic signals, pressure signals, magnetic signals and 
chemical signals. The disclosed systems and methods can be applied to separate signals from 
internal sources, for example within a cardiac chamber, within a blood vessel, and so forth. The 
disclosed systems and methods can be applied to separate signals from external sources such as the 

35 skin surface or away from the body. They can also be applied to record and to separate signals 
from animal subjects. 
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Although the foregoing has described certain preferred embodiments, other embodiments 
will be apparent to those of ordinary skill in the art from the disclosure herein. Additionally, other 
combinations, omissions, substitutions and modifications will be apparent to the skilled artisan in 
view of the disclosure herein. Accordingly, the present invention is not to be limited by the 
5 preferred embodiments, but is to be defined by reference to the following claims. 

The present application incorporates by reference U.S. Patent No. 5,706,402, titled "Blind 
signal processing system employing information maximization to recover unknown signals through 
unsupervised minimization of output redundancy" filed November 28, 1994 in its entirety as an 
APPENDIX as follows. 
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United States Patent No. 5,706,402 
Inventor: Anthony J. Bell 

Blind signal processing system employing information maximization to recover 
unknown signals through unsupervised minimization of output redundancy 
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unknown source signal from the output of an unknown 
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KSSGHl^oraVlSED tontion. For in5t-.ce, high-speed data trnns^rion over a 

RRFERENCB TO GOVERNMENT RIGHTS training mode that transmits a known training sequence to 

^ establish deconvolution parameters or in a blind mode. 

The U. S, Oovemment has rights in thekvention dis- The class of communication systems that may need blind 

closed and claimed herein pursuant to Office of Naval w cqualizadon cap ^ mX y includes high^apacity line-of-sitc 

Research grant no, N00014-93- 1-0631. digital radio (cellular telecommunications). Such a channel 

« a W nn/«TMn nc thp INVENTION sufFers from anomalous propagation conditions arising from 

BACKGROUND OF THE INVENTION Datun] conditions, which can degrade digital radio perfor- 

1 Held of the Invention mance by causing the transmitted signal to propagate along 

This invention relates generally to systems for recovering " several paths of different electrical length (mutapath 

JSS S3S2TS&» subjected to transfer through fading). Severe mult^ath fading rentes a blind equauza- 

an urJmown rnultichannel system by processing the known tion scheme to recover channel operation, 

output signals therefrom and relates specifically to an In reflection seismology, a reflection coefficient sequence 

inforrnation-nuaiJiiizing neural network that uses unsuper- can be blindly extracted from the received signal, which 

vised learning to recover each of a multiplicity of unknown » mdud£S echoes produced al the different reflection points of 

source signals in a rnultichannel having reverberation. the unknown geophysical modcL The traditional linear- 

2. Description of the Related Art predictive seismic ^convolution u^^**"™*" 

ucsmpaw «* u» „„^„| n0 source waveform from a seismogram ignores valuable phase 

Blind Signal Process.ng: In many signal proc^tog contained in the rcflection sdsmogram This 

appUcarions, the sample signals provided by ^ senwrs are y overcome by using blind deconvolution to 

^t^^ ^"rLived signal by Luming only a general 

sources problem is to exrraaine statistical geological reflection coefficient model. 

^JT^^^uZSrS ^Jcoion^alsobeu^eo^^ 

^^rotTrigaUso^es other than the general „ images that are blurred by transmission through unknown 

statistical assumption of source Independence, this signal systems. 

processing problem is known in the art as the "blind source Blind Separation Methods: Because of the fundamental 
separsrtion raohlem". Hie separation is "blind"* because importance of both the blind separation and blind decoovo- 
nothing is known about the statistics of the independent imfon signal processing problems, practitioners have pro- 
source signals and nothing is known about the mixing posed several classes of methods for solving the problems. 
aocest The blind separation problem was first addressed In 1986 by 
The blind separation problem is encountered in many jutten and Herault ("Blind separation of sources, ftrtl: An 
familiar forms. For instance, the well-known "cocktail adaptive algorithm based on , n ^^°' "^^< • 
m^roMem refers to a situation where the unknown Signal pressing 24 (1991) 1-10). who disclose the HJ 
KefaS «e sounds generated In a room and the « neural network with backward connexions that can usuaUy 
kZ£(S siguUare tie outputs of several micro- * solve the simple two-element blind source separate prob- 
SoTs. S3 X^erigMls to delayed and attenuated lem. Dlsadvantageously. the HJ network iterations may no. 
to ^ toe va^togTmanL during iansmbsion from converge to . proper solution in some cases, depending on 
t^tSSS^* is then mixed with other the initial state and on the source statistics. When conver- 
b^dl^^aSauated source .i^,lndud- « gence is possible, the HJ «™<* *™ 

mixtures of two speaking voices reaches one of two micro- 50 J°*"*' ^ ^'^i"^^^ ^n be viewed as 

Sal Stdc souths by a superconducting Other practitioners have attempted to Improve the HJ 

ouantum interference device (SQUID) array In magnetoen- network to remove some of the disadvantageous features, 

echography. Other important examples of the blind For instance. Scrouchyari ("Blind separation * <ourei. p j« 

source separation twblaninclude sonar amy signal pro- HI: Stability analysis" Signal Pneesdng 24 (1991 ) 21-29) 

eessins and siscal decoding in cellular telecommunication « examines other higher-order non-linear transforming rune- 

eessmg anu signal «». ^ dons other than those simple first and third order functions 

The blind source separation problem is closely related to proposed by Jutten « aL but eoodBder tha I the highc^rder 

fte mWlamlllar -tUrSdeconvolutton- problem, where a functions cannot "^e ^n^r^uon of the HJ net. 

^:J7]^L rmm .laMi I. extracted from a known work. In VS. Pat- No. 5383,164. filed on Jun. 10. 15W as 

SS^aS^y ti^ddayed version, cf « application Set. No. 08/074.940 and fully tocorporated 

^l^^r&^uS^ultipam distortion or herein by this reference. U et aL describe a bltod source 

^S^^n^don^e need for blind decon- separation system based on the HJ neural network model 
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that employs linear bcamfcrming to Improve HJ network cal access to the system input signal Tills unknown system 

separation performance. Also, John C. Piatt et al. may be a no nminfmiim phase system having one or more 

('•Networks For Hie Separation of Sources That Are Super- zeroes outside the unit drcJc in the frequency domain. The 

imposed and DdayedVAAwKtts in Neurol Information blind decoovolutioo process must identify both the magui- 

Processing Systems, vol. 4. Margan-Kjiurmann, San Mateo, 5 tude and the phase of the system transfer function. Although 

1992) propose extending the original inagnimo^ciJuiniiing identification of the magnitude component requires only the 

HJ network to estimate a matrix of time delays in addition second-order statistics of the system output signal identifi- 

to the HJ magnitude mixing matrix. Piatt et al. observe that cation of the phase component is more difficult because it 

their modified network is disadvantaged by multiple stable requires the higher-order statistics of the output signal 

states and unpredictable convergence, lQ Accordingly, some form of non-linearity is needed to extract 

Pierre Comon ( M mdependent component analysis, a new the higher-order statistical information contained in the 

concept?" Signal Processing 36 (1994) 287-314) provides a magnitude and phase components of the output signal Such 

detailed discussion of Independent Component Analysis non-linearity is useful only for unknown source signals 

(ICA) which defines a class of closed form techniques having non-Gaussian statistics. There is no solution to the 

usefulVwsolvkgmehundictent^ ,< problem when the input source signal is Gausslan- 

problems. As Is known in the art, ICA searches for a distributed and the channel is iionminimum-phase because 

transforation matrix to minimize the statistical dependence all polyspectra of Gaussian processes of order greater than 

among components of a random vector. This is distinguished two are identical to zero. 

from Principal Components Analysis (PCA), which searches Classical adaptive deconvolution methods are based 

for a transformation matrix to minimize statistical carrela- w almost entirely on second order statistics, and thus fail to 

tion among components of a random vector, a solution that operate correctly for noimunimum-phase channels unless 

is inadequate for the blind separation problem. Thus, PCA the Input source signal is accessible. This failure stems from 

can be applied to nuTiimizc second order cross-moments the inability of second-order statistics to distinguish 

among a vector of sensor signals while ICA can be applied minimum-phase information from maximum-phase infor- 

to niinirnize sensor signal joint probabilities, which offers a M mation of the channel A ininimum phase system (having all 

solution to the blind separation problem. Comon suggests zeroes within the unit circle in the frequency domain) 

that although mutual information is an excellent measure of exhibits a unique relationship between its amplitude 

the contrast between joint probabilities, it is not practical response and phase response so that second order statistics 

because of computational complexity. Instead. Comon in the output signal are sufficient to recover both amplitude 

teaches the use of the fourm-ordex cumulant ten sor (thereby M and phase information for the input signal. In a 

ignoring fifth -order and higher statistics) as a preferred nonminimunvphasc system, second-order statistics of me 

measure of contrast because the associated computational output signal alone are insufficient to recover phase infor- 

complexity increases only as the fifth power of the number mation and. because the system does not exhibit a unique 

of unknown signals. relationship between its amplitude response and phase 

Similarly, Gilles Burel ("Blind separation of sources: A 35 response, Wind recovery of source signal phase information 

nonlinear neural algorithm", Neurol Networks 5 (1992) is not possible without exploiting m^er^rto outout signal 

937-947) asserts that the blind source separation problem is statistics. These require some form of non-lincax processing 

nothing more than the Independent Components Analysis because linear processing is restricted to the extraction of 

(ICA) problem. However, Burel proposes an iterative second-order statistics. 

scheme for ICA employing a back propagation neural net- 40 Bussgang techniques for blind deconvolution can be 
work for blind source separation that handles non-linear viewed as iterative poly spectral techniques, where rationale 
mixtures through iterative minimization of a cost function. are developed for choosing the poly spectral orders with 
Burel' 5 network differs from the HJ network, which does not which to work and their relative weights by subtracting a 
minimize any cost function, like the HJ network, Burel' s source signal estimate from the sensor signal output. The 
system can separate the source signals in the presence of 45 Bussgang techniques can be understood with reference to 
noise without attempting noise reduction (no noise hypoth- Sandro Bellini (chapter 2: Bussgang Techniques For Blind 
eses are assumed). Also, like the HJ system, practical Deconvolution and Equalization-, Blind Deconvolution, S. 
convergence is not guaranteed because of the presence of Haykin (ed), Prentice Hall, Englewood Cliffs, NJ., 1994). 
local nunima and computational complexity. BurePs system who cruuacterizes the Bussgang process as a class of pro- 
differs sharply from traditional supervised back-propagation 50 cesses having an auto-correlation function equal to the 
applications because bis cost function is not defined in terms cross-correlation of the process with itself as it exits from a 
of difference between measured and desired outputs (the zero-memory non-linearity. 

desired outputs are unknown). His cost function is instead Polyspectral techniques for blind deconvolution lead to 

based on output signal statistics alone, which permits "unsu- unbiased estimates of the channel phase without any infor- 

pervisecT learning in his network. 55 mation about the probability disuibution of the input source 

Blind Deconvolution Methods: The blind deconvolution signals. The general class of polyspectral solutions to the 

art can be appreciated with reference to the text edited by blind decorrclalion problem can be understood with refer- 

Simon Haykin W*d Deconvotuthn, Prentice-Hall, New ence to a second Simon Haykin textbook CO. 2Ch Blind 

Jersey 1994) which discusses four general classes of blind Deconvolution'', Adaptive Filter Theory, Second Ed., Simon 

deconvolution techniques, including Bussgang processes, 60 Haykin (ed.), Prentice Haff, Englewood Cliffs, N.J.. 1991) 

Wnhcr-order cumulant equalization, polyspectra and maxi- and to Hatzinakos et al ('*Ch. 5: Blind Equalization Based 

mum likelihood sequence estimation. Haykin neither con- on Higher Order Statistics (HOS)". Blind Deconvolution^ 

siders nor suggeste specific neural network techniques suiu Simon Haykin (ed.). Prentice Hall. Englewood Cliffs, NX, 

able for application to the blind deconvolution problem. 1994). 

Blind deconvolution is an example of ^supervised" 63 Thus, the approaches in the art to the blind separation and 

learning in the sense that it learns to identify the inverse of deconvolution problems can be classified as those using 

an unknown linear time-Invariant system without any physi- non-linear transforming functions to spin off higher-order 
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statistics (Jotteo et IL and Bellini) and those using explicit examines this Issue and shows ^-^iB^umenffw 

cdaLion of higher-order comments and polyspeetra coding" in a biologtad sensory system cj^storeOuce 

^1 Hatztoakos et aL). The HI network doe. not the trcHiUesomejmutual W«™^f " l £222aS£flSZ 

reUabh7convc«e even for the simplest two-source problem expense of suboptirad symbol frequency distnbtition. Bar- 

^fZ foSto imulanMeW approach does not , low shows that the mutual tnfornution comported tof redun- 

£j£ conv«e1bee.use of truncation* the cumnlant dancy can be minimized in a neural network by feedtag cad. 

w^o^ThOTi. accordingly a dearly-felt need to blind neuron output back to other neuron la^t, du^gha*- 

SKranbg methods thai can reliably solve the blind HebWan synapses to discourage <?***°W"*^ 

m^sstoTwobkm to significant numbers of source sig- This "redundancy reduction" principle is offered to explain 

processing problem ior signincau. ^ ^ UMUp ervised perceptual learning occurs in animals. 

Unsupervised Learning Memods: In the biological sen- S. Laughlin ("A Simple Coding Procedure j^ c « » 
sory sysW arts, practitioners have formulated neural Mi*. Neuron's Information Capacity". Z. Nature 36 (1981) 
Sfotei. based on studies of biological sensory 91(M>12) ^ves that the opttcd neoron of a blowfly opd 
ZX^Uch are known to solve blind separation and mixes information capacity through e^ualkaUoo of the 
^vcXn pr^otems of many kinds. The dais of super- „ probability distribution for each n^ code value 
^learning techniques normally used with artificial neu- (minimizing the unused channel capadty compact of 
Z Tne^b srTnoi useful for these problems because redundancy), thereby confirming [Bartow's "rninimum 
leSfea^gTequSraccess to me'source signals for redundancy- principle. I. J. Hopfield 
nStpurooses Unsupervised lesrning Instead requires tion and object perception-, Proe. NatL Acad. Set USA** 
Iffll&teLssy^l 20 (August 1991) 6462-6*66) examines the 
signals without access to the source signals. source solution in neurons using the HI neuron model for 

^metitioners have proposed severd rationale for unsuper- minimizing output redundancy 
vis^Sng in WoScnl sensory systems. For instance, Becker e, aL f*f^«*^^£J? £ 
Linsker ("An AppUcaticn of the Principle of Maximum covers surfaces in mndom-dot stereograms -Nam* vol 
mfionPresovation to Linear System,". Advsmcer in M 355. pp. 161-163, Jan. 9. 1992) propose a standard 1 back- 
tu^nfom^^PrZesring SysuL l. D. S. Tbureuky propagation oeord network learning model to 
M^SSS^Mam. (1989) shows that his well-known replace the externa) l teacher supervised tanug) by 
•Soma^prindple (first proposed in 1987) explains why ioterndiy-dertved teaching ^ j^P^Jf^ 
bioloricd sensoTsy stems operdeW minimize information Becker et aL use non-Unear networks to maximize mutual 
^S^^l^K^esenceofooise.In alder „ Information between different sets of ou*uts^«ry to the 
workTS Synaptic Learning Rules Suffice to Maximize blind rignal recovery requirement. By increasing 
MumsJ Si in . Linear Network". Ntural Compu- redundancy, their network ^«J^f"^^ 
Zen 4 (1992) 691-702) Linsker describes a two-phase groups of inputs, which can be selected Tout ofinfamaOon 
tea-nlna deorimm for maximizing the mutud information passed forward to improve processing efficiency, 
between two layers of a neurd network. However. Unsker 35 Thus, it is known in the neural network aits that antt- 
assumes a linear input-output transforming function and Hebbian mutud interaction can be used to explain the 
multivariate Gausdahstatistlcs fa both source signals and decorrelation or minimization of redundancy observed in 
noise components. With these assumptions, Linsker shows biologicd virion systems. This can be apprecutedjwitii 
that a "local synaptic" (biological) learning tule is sufficient reference to H. B. Barlow et d. ("Adaptation andDeccire- 
to maximize mutud information but he wither considers nor «> Utioo in the Cortex". The Computing NeuranR. Durbin et 
JuST^utions to the more generd blind processing aL (eds.). Addisoo-Wesley. (1989) and to Sctaudolph et d. 
prtblem of recovering aoo-GauMian source signals in a ("Competitive Antl-Hebbian Learning of Invariance 
STlinear tranifonning environment Advances In Neural Informal £ 

Simon HaykinTcb. 11: Sdf-Orgamring Systems ID: Moody et d (eda.). Morgan-lUufmann 19WM- fact 
Iru^tioo-Theoretic Models". Neural NerwcrU: A Con.* practidooers ^"Z^Tr^^^^c^y 
ILlL,/,,, Fomjaden S. Havkin (ed.) MacMfllan. New ciple and Barlow's "minimum redundancy principle may 
WW) ^^^t^iZ^a'l^^ bom yield the same neurd network learning procedures 
r^^TZn^LwcA learning rukuid in its Until now. however, non-linear verrions ofth«e procedures 
Sle^^a H^^"SieTX well-known applicable to the blind signd processing problem have beea 
trindoles such as the "minimization of information loss" 50 unknown in the an. J , 

windote suzzested in 1988 by Plumbley et >L and Barlow's The Blind Processing Problem: As mentioned above. 
Sk Zfntmum rXiLcy-. tot proposed in 1961. blind source separation mid blind tleconvolution are related 
dmoof which can be used to derive a class of unsupervised problems m signd processing. The b^d sc^ separ^uon 
WndMiules. problem can be succinctly stated as where a set of unknown 

S Adck(«t:ould informatioo theory provide an eco 35 source signds S<t). . . S/t^ are mixed togemertocarty 
loieTmoT of se^procesringr^vXor* 3 (1992) by an unknown matrix fVNothing is known about the 
21«51 WocUes Shannon's information theory to the sources or the mixing process both of which may be 
r^l*J%* "enTbiotoglcd opticd sensors. Attck time-varying, although the mixing proa,, is assumed to 
obs^eSmTor^tion r^S Useful ody in noise vary dowly with respect to the source. The bhnd ^paraUon 
wowmponents: (a) unused channel capadty « task is to recover 

^,,"rrr „*«,tin«i .vmbol freouency distribution and measured superposiuons of them. X/t) X/t) by finding 

^^wttwffifSo.. Atick a square matrix rw,] that is a permutation of the invert of 
SSStitjSStSm ^Tttyevorved to mini- mTunknown ^W^ b ™^^ a ^™ 
XtheTZesome intersymbol redundancy (mutud can be similarly statedas wh«asmdeun!mown.«ndS(0 
mfa\naL)con^nent ofredSney rather than to mini- « i, convolved with an unknown tapped dday-lme filter 
ZnS^dancy. a B. Bartow ("Unsupervised A„ . . . . A„ produdog the ccuruptcd measured ngnd 
S^N%y^*£ 1 (1989) 295-3lT) dso X(t)=A(t) • S(t). where A(t) U the impulse response of the 
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unknown (perhaps slowly time-varying) fllta. The Wind number of higher-order Mat!** associated ^statistical 

devolution oik Is to recover S(t) by finding and con- dependency. The related unresolved problems and defldcn- 

voMng X(t) with a tapped delay-toe filter W,. . . . , W y de* are clearly felt to the an and are solved by mis invention 

having the Impulse response W(t) that reverses the effect of in (he manner described below. 

* C T^riTs^« between the two problems. In ' SUMMARY OP THE INVENTION 
one. source signals are corrupted by the superposition of ^ invention solves the above problem by introducing a 
other source signals and, in the other, a single source signal QCW dass of unsupervised learning procedures for a neural 
is corrupted by superposition of time-delayed versions of ^ ^We ^ general blind signal processing prob- 
itself. In both cases, unsupervised learning is required I0 lcm ^ ^j^,^ j otot input/output entropy through gra- 
because no error signals arc ivailable and no training signals ^ M ^ mtotodze mutua ] informalion in the outputs, 
are provided. In both cases, second-order statistics alone are Thc network of invention arises from the unexpectedly 
Inadequate to solve the more general problem For instance, advantageous observation that a particular type of non-linear 
a second-order Ascorrelation technique such as that pro- . . (raQ8fQnn aeates learning signals with the higher- 
posed by Barlow et al. would find uncoirelated (linearly l3 fi eeded to separate unknown source signals 
independent) rejections [Y/l of the Input sensor signals jX,] m<n imi>in g mutual information among neural network 
when attempting to separate unknown source signals {S,} ^ si^^ This invention also arises from thc second 
but is limited to discovering a symmetric cccorrelation unexpectedly advantageous discovery that mutual informa- 
tnatrixthat cannot reverse the effects of mixing matrix[/U Uon ^ Qcural network outputs can be rmnirnixed by 
if the mixing matrix is asynunetric. Similarly, secondjorder w mftTlmjyin g j oin , outptIt cntropy whca the learning trans- 
dcoorreUtion techniques based on the autocorrelation form is selected to match thc signal probability distributions 
function, such as prediction-error filters, are phase-blind and 0 f interest 

do not offer sufficient information to estimate the phase appreciated as a 

characteristics of toe earring filter ^when applied to ™S2to Tof the infomax principle to io^uTear units 

^JT'SSYf'T* ^i^^the 25 wluta^Sy distribute 

Tims, both blind signal processing problems squire the sources. It is a feature of the system of this invention 

use of higher-order statistics as well as cerudn assumptions ^ ^ is ^ed through a predc- 

regarding source signal statistics. For the bhnd separation tamixLed d ld taction to adapilvely nucdrnize lnforma- 

problem, the sources are assumed to be statisticaUy tode- doQ transfer^ optimal alignment of the monotonic sigmoid 

pendent and non-Gaussian. With this assumption, jhe pcto- 30 . ^ input signal peak probabUity density. D is an 

lem of learning (W J becomes the If^r^blem ocscrited by of J> urve^^t^Lcy is inmimized 

Comon. For blind convolution, the ordinal signal S(t) is arnuWpUdty of outputs merely by inaximizing total 

assumed to be a "white- process consisting of ^dependent ^JL t producing Ac independent 

symbols. The blind convolution problem then ^cames serration ^Sea 

the problem of removing from the measured signal X(t) any 35 _ . ^ w ^Jll, fftftmres Md 

statical dependencies across time that are introduced by Tht foregoing ^^J^*^^^^^ 

*h» « w «inHf.Vfliter Am This nrocess is sometimes denomi- advantages of this invention, can be better appreciated with 

^^^^ ^r^ reference to the following specification, claims and the 

As used herein, both the ICA procedure and the ' whit- accompanying drawing, 

ening" of a time series are denominated •Yedundancy reduc- «o BRIEF DESCRIPTION OF THE DRAWING 
don". The first class of techniques uses same type of explicit 

estimation of cumulants and polyspectra, which can be For a more complete understanding of this invention, 

appreciated with reference to Haykin and Hatzinakos et al reference is now made to the following detailed description 

lSSadvantflgeously, such # *rute farce" techniques are com- of the embodiments as illustrated in the accompanying 

putationaHy intensive far high numbers of sources or taps as drawing, wherein: 

and may be inaccurate when cumulants higher than fourth FIGS. 1A, IB, 1C and ID illustrate the feature of sig- 

order are ignored; as they usually must be. The second class moidal transfer function alignment for optimal information 

of techniques uses static non-linear functions, the Taylor flow in a sigmoids! neuron from (he prior art; 

series expansions of which yield higher-order terms. Itera- FIGS. 2A, 2B and 2C illustrate the blind source separation 

rive learning rules containing such terms are expected to be so and blind deconvotution problems from the prior art; 

somehow sensitive to the particular higha-order statistics ytCS, 3A. 3B and 3C provide graphical diagrams illus- 

necessary to accurate redundancy reduction. This reasoning trating a joint entropy maxiiniiation example where maxi- 

is used by Comon et al. to explain the HJ network and by mizlng joint entropy fails to produce statistically todepen- 

Bellini to explain the Bussgang deconvolver. dcnt because of iinproper selection of the 

Disadvantageous^, there Is no assurance mat the particular 55 oon .u n ear transf orating function: 

higher-order statistics yielded by the (heuristicaUy) selected HQ 4 sh0W5 mc theoretical relationship between thc 

non-linear function are weighted in the manner necessary for scvcna cntr opi C s and mutual infermation from the prior art; 

achieving statistical independence. Recall* at the toown 5CV «£ "j^s a functional block diagram of an illustrative 

approach toatternpting improvement * * ^ J5^£^^ 

test various non-linear functions selected heuristically and 60 ' *" r 

ma. the original function, are «« yet *e«t bon; 

tnutisdcTdependency UntU now. Oils wa, embodiment of the combined blind source separation and 

believed to be practically impossible because of the infinite blind decorrelation network of this invention; 
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piCS 8A. 8B and 8C show typical probability density RefenJog to HO. 1A. wbeo a single input x is passed 
funddons to speech, rock music Lxl G«u,.ian white noise; through a transforming function g(x) to pve an output 

before ,and ^+Z*Z£? * pctfannedaccordtogtothe s ft IP^ ^"digid withe steepest sloping portion of 

procedure of this invention, non-linear transferring function g(x). This h equivalent to 

FIG. 10 shows the results of a blind source separation ^ ^ Dt of a neuron kput^utrait function to the 
experiment performed using deprocedure of tiiis invention; attribution of incoming signals that leads to 

and optimal Information flow in sigmoidal neurons shown in 

FIGS. 1LA, UB, 11C 11D, 11E, 11F, 11G, UH, 11L 1U, t0 nGS ic_id. FIG. U> shows a zao-raode distribution 

UK and 11L show time domain filter charts illustrating the matched to the sigmoid function in FIG. 1C. In FIG. 1A. the 

results of the blind deconvolution of several different cor- mput x having a probability distribution f^x) is passed 

rupted human speech signals according to the procedure of through the non-linear sigmoidal function g(x) to produce 

this invention. output signal y having a probability distribution fUy). The 

_.__... JI 1JL U , 15 information in the probability density function fjy) varies 

DETAILED DESOTFTONOFraE respewive to the a^nment of the mean and varW of x 

PREFERRED EMBODIMENTS wiftrespect to the threshold w, and slope w of g(x). When 

This invention arises from the unexpectedly advantageous g( x ) is moootomcaUy increasing or decreasing (thereby 

observation that a class of unsupervised learning rules for having a unique inverse), the output signal probability 

maximizing information transfer in a neural network solves ^ density function Uy) can be written as a function of the 

the Wind signal processing problem by minimizing redun- mpu t signal probability density function fjx) as follows: 
dancy in the network outputs. This class of new learning 
rules is now described in information theoretic terms, first At*) 
for a single input and then for a multiplicity of unknown , - . 

input signals. 25 | d* \ 

Information Maximization For a Single Source 

In a singje-input network, the rnutual bforrnation that the where M denotes absolute value, 
output y ofaTtwork contains about its input x can be Eqn. 3 leads to the unexpected dscovery ofan advanta- 

erwessed as* geous gradient descent process because the output signal 

^ 30 entropy can be expressed in terms of the output signal 

l(»*)*H(yyHW*) 1B P L l ) probability density function as follows: 

where H(y) is the entropy of the output signal, H(ytx) is that 

portion of the output signal entropy that did not come from f +- 

&e input signal and UyX) is the mutual information. Eqn. 1 «O)»-flfaA0')l=-J M>)*w)*y 

can be aj^redated with reference to FIG. 4, which illustrates 33 * "~ 

the well-known relationship between input signal entropy where E[ ) denotes expected value. Substituting Eqn. 3 into 

H(x), output signal entropy H(y) and mutual information Eqn. 4 produces the following: 

When mere is 00 noise or when the noise is treated as r 1 ^ 11 _ flta/jW , ^ 51 

merely another unknown input signal, the mapping between 40 w L I I J 

input x and output y is deterrmnistic ^ ^^nal cnfropy 5 ^ rimply me 

H(ybt) has its lowest possible value, drverging to -minus ^ ^ H (x), which cannot be 

infinity. This divergence is a consequence of megencraU- J U £ r*ramcter w that defines 
ration of hibernation theory " m 

ables. The output entropy K(y) isreaUy du^Ual 45 ™ *^ ^ raaximlzed to maximize the 

entropy of output signal y with ^£ s ^^£ * shmi entropy H(y). Tois first term is the average 

such as the noise level or the gnmu^ofme o^ effect of input signal x on output signal y 

representation of the varies in x and y. Jhc«fl«cre^ ma3drnl2C d byTnsidering the input signals as 

complexities can be avoided by restricting me network to the ■"■g"V r ^ f , x) ^ deriving an online, 
consideration of the *> 

titles with respect to some parameter w. Such gradients are v 

as well-behaved as are discrete-variable entropies because H (Eqn. 6] 

the reference terms involved in the definition of differential w B / \ & \\_{ » \ * ( 

entropies cosappear In particular, Eqn. 1 can be different.- -*T ( to | ^\r{^) ^ ) 

^too^^tc^cspon^^^zsm^s: 55 ^ 6 ^ a icaling roeasure Aw for changing the 

a jap. 21 parameter w to adjust the log of the slope of sigmoid 

" w function. Any sigmoid function can be used to specify 

m9 , ^ K J . » measure Aw, such as the widely-used logistic transfer func- 

because, in the noiseless case, H(ytx) does not depend on w V^T 
and its differential disappears. Thus, for continuous deter- 60 

ministic matchings, the mutual information between net- y=< !-,-■)-*, where ^ [Eqn. 71 

work input and network output can be maximized by maxi- ... 

mixing megradientof the emropy of the output alone, which in which the input x is first aligned wi* the agn^^ ^<>n 

is an unexpectedly advantageous consequence of treating through replication by a scaling weight w and addition of 

ncTe TtZ^LZZ source aignal This permits the 65 a bias weight w 0 to create an aligned signal u which u then 

discussion to continue without knowledge of theinput signal non-Uneariy trasfonncd by the ogistic J™««J*** 0 *" 

^^"^ create signal y. Another useful sigmoid function is the 
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hyperbolic tangent function expressed as y=*anh(u). The 
hyperbolic tangent function is a member of the general class 
of functions g(x) each representing a solution to the partial 
differential equation. 



fEcjn.6] 



with aboundary condition of g(0H>. The parameter r should 
be selected appropriately for the assumed kurtosis of the 



If the hyperbolic tangent sigmoid function is used, the 
bias measure Aw 0 then becomes proportional to -2y and me 
scaling measure Aw becomes proportional to -2xy+w~ l , 
such thai Aw c p-2ya and Aw*e(-2xy+w- 1 ), where e is the 
learning rate. These learning rules offer the same general 
features and advantages of the learning rules discussed 
above in connection with Bqns. 10-11 for the logistic 
transfer function. In general* any sigmoid function in the 
class of solutions to Eqn. 8 selected for parametric suitability 



be select** VW""^^" 10 t o a particular input wobaWlity distrftiition can be used in 
incut probability distribution. Fox kurtosis above 3. cither 1U w ■ **w*»»7 *r . ^ 

wpui F°°™^ ' ™™ non-memha accordance with the process of mis invention to solve the 



the hyperbolic tangent function (r»2) or the non-member 
logistic transfer function is well suited for the process of this 
invention. 

For the logistic transfer function (Eqn. 7). the terms In 
Eqn. 6 can be expressed as: 



Dividing Eqn. 10 by Eqn. 9 produces a scaling measure 
Aw for the scaling weight learning rule of this invention 
based on the logistic function: 



process < 

blind signal processing problem. These unexpectedly advan- 
tageous learning rules can be generalized to the multi- 
dimensional case, 

Joint Entropy Maximization for Multiple Sources 

lb appreciate the multiple-signal blind processing method 
of this invention, consider the general network diagram 
shown in FIG. 2A where the measured input signal vector 
[X] is transformed by way of me weight matrix [W] to 
1 20 produce a moaotonically transformed output vector [Yj=g( 
[WJfXJ-KWo)). By analogy to Eqn. 3, the multivariate 
probability density function of [Y] can be expressed as 



{Eqn. 9] 



AM* 



UK) 



|B*n.l3J 



••Ml+2y>*w - 1 ) 



where oO is a learning rate. 

Similar reasoning leads to a bias measure Aw 0 for the bias 
weight learning rule of this invention based on the logistic 
transfer function, expressed as: 



30 



where Ut is the absolute value of the Jacobian of the 
transformation that produces output vector |Y] from input 
vector |X). As is well-known in the art, the Jacobian is the 
determinant of the matrix of partial derivatives: 



[Bqfn.12] 



These two learning rules (Bqns. 11-12) are implemented 
by adjusting mc respective w or w 0 35 
which is usually less than one percent (c<0.01), as is known 
in the neural network arts. Referring to FIGS. 1A-1C. if the 
input probability density function f Jx) is Gaussian, then the 
bias measure Aw 0 operates to align the steepest part of the 
sigmoid curve g(x)with the peak x of f^x), mereby match- 40 
ing input density to output slope in the manner suggested 
intuitively by Eqn. 3. The scaling measure Aw operates to 
align the edges of the sigmoid curve slope to the particular 
width (proportional to variance) of fj(x). Thus, narrow 
probability density functions lead to sharply-sloping sig- 45 
moid functions. 

The scaling measure of Eqn. 1 1 defines an "anti-Hcbbian" 
learning rule with a second "anti-decay* term. The first 
anti-Hebblan term prevents the uninformarive solutions 
where output signal y saturates at 0 or 1 but such an so 
unassisted ancl-HebUan rule alone allows the slope w to 
disappear at zero. The second anti-decay term (1/w) forces 
output signal y away from the other uninformative situation 
where slope w is so flat that output signal y stabilizes at OS 
(FIG. 1A). 35 

The effect of these two balanced effects is to produce an 
output probability density function fjy) that is close to the 
flat unit distribution function, which is known to be the 
maximum entropy distribution for a random variable 
bounded between 0 and 1. FIG. IB shows a family of 60 
sigmoid output distributions, with the most informative one 
occurring at sigmoid slope w^ Using the logistic transfer 
function as the non-linear sigmoid transformation, the learn- 
ing rule in Eqn. U eventually brings the slope w to w^ 
thereby rnaximizing entropy in output signal y. The bias rule 65 
in Eqn, 12 centers the mode in the sloping region at w 0 
(FIG. 1A). 



9yi 



where detf.) denotes the determinant of a square matrix. 

By analogy to the single-input case discussed above, the 
method of this invention maximizes the natural log of the 
Jacobian to maximize output entropy H(Y) for a given input 
entropy H(X). as can be appreciated with reference to Eqn. 
5. The quantity lnlJI represents the volume of space in [Y] 
into which points in (X] are mapped. Maximizing this 
quantity attempts to spread the training set of input points 
evenly [YJ. 

For the commonly-used logistic transfer function, the 
resulting learning rules can be proven to be as follows: 

(AU^aiMWO r HWT'> ») 

In Eqn. 15, the first anu-Hebbian term has become an 
outer product of vectors and the second anti-decay term has 
generalized to an "anti-redundancy" term in die form of the 
inverse of the transpose of the weight matrix (W). Eqn. 15 
can be written, for an individual weight W y as follows: 



^■•■(tSS** 1 - 2 *) 



where coflWy] denotes the cofactor of element W^. which 
is known to be (-t)' v times the determinant of the matrix 
obtained by removing the i* row and the j tt column from the 
square weight matrix [W] and e is the learning rate. 
Similarly, the i* bias measure AW n can be expressed as 
follows: 
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The rules shown in Eqns. 17-18 are the same as those for 
the single unit mapping (Eqns. 11-12) except that the 
instability occurs at dctlW]-0 instead of w*0. Thus, any 
degenerate weight matrix leads to instability because any 
weight matrix having a zero determinant is degenerate. This 
fact enables different cutouts Y, to learn to represent differ- 
ent things about the inputs X,. When the weight vectors 
entering two different outputs become too similar, dct(W] 
becomes small and the natural learning process forces these 
approaching weight vectors apart This effect is mediated by 
the numerator coffWj,], which approaches zero to indicate 
degeneracy in the weight matrix of the rest of the layer not 
associated with input X, or output Y,. 

Other sigmoidal transformations yield other training rules 
that are similarly advantageous as discussed above in con- 
nection with Eon. 8. For instance, the hyperbolic tangent 
function yields rules very similar to those of Eqns. 17-18. 



the Jacobian of the Eqn. 22 transformation according to Eqn. 
13. The ensemble can be "created- from a single time series 
by breaking the series into sequences of length L which 
reduces [W] In Eqn. 23 to an Ixl lower triangular matrix The 
Jacobian of the transformation is then written as follows: 



w241 



10 



which may be decomposed into the determinant of the 
weight matrix [W] of Eqn. 23 and the product of the slopes 
of the sigmoidal squashing function for all times L Because 
I Wl is Lower-triangular, its determinant is merely the product 
of the diagonal values, which is W/. As before, the output 
signal entropy H(Y) is maximized by maximizing the loga- 
rithm of the Jacobian, which may be written as: 

[B<JD. 25) 



bin 



'1*1 



( 



[Bqa. 19) 



20 



If the hyperbolic tangent is selected as the noo-ilnear 
sigmoid function, then differentiation with respect to the 
filter weights W(t) provides the following two simple learn- 
ing rules: 

(Eqp. 26] 



z5 *^k— J[(-ir-«w) 



[Bqa. 271 



The usefulness of these blind source separation network 
learning rules can be appreciated with reference to the 
discussion below in connection with FIG. 5. 
Blind Deconvolution in a Causal Filter 

FIGS. 2B-2C illustrate the blind deconvolution problem. ^ 
HO. 2C shows an unobserved data sequence S<t) entering an 
unknown channel A(t), which responsively produces the 
measured signal X(t) that can be blindly equalized through 

a causal filter W(t) to produce an output signal U(t) approxi- therefore adapts nice a weigw coonewcu w » « ™- 
mating the original unobserved date sequence S(t). FIG. 2B 3J 0ttly ^ onc input (Eqn. 11 above). The other top weights 
shows the time series X(t), which is presumed to have a {w,} attempt to decorrelate the past input from the present 
length of J samples (not shown). X(t) is convolved with a output Thus, the leading weight W, keeps the causal filter 
causal filter having I weighted taps, W|. . . . , W, and Impulse from "shrinking". 

response W(t). The causal filter output signal U(t) is then Other sigmoidal functions may be used to generate simi- 
passed through a non-linear sigmoid function gQ to create ^ my useful learning rules, as discussed above in connection 
me training signal Y(t) (not shown). This system can be with Bqn. 8. The equivalent rules for the logistic transfer 



In Eqns. 26-27, W, is the "leading weight" and 

WXM I) represent the re mainin g weights in a delay 

line having I weighted taps linking the input signal sample 
X^. to me output signal sample Y r The leading weight W, 
therefore adapts tike a weight connected to a neuron with 



expressed either as a convolution (Eqn. 21) or as a matrix 
equation (Eqn. 22) as follows: 



function discussed above can be easily deduced lo be: 



lyfcsWUD 



[EqD. 21] 
[Bqp. 22] 



in which [Yl=g(lUl> ** d M signal sample vectors 
bavin* J samples. Of course, the vector ordering need not be 



• I XV-i(l-2y^«*ae'> 1 
/"I 



u 28] 



having J samples. Of course, the vector oroenng neeo not dc ^ us dulness of these causal filter learning rules can be 
temporal. For causal filtering, [W] is a banded lower trian- x tpp^dtfai with reference to the discussion below in con- 
fvf iffliaM matrix expressed as; n^inn with FTOS. 6 and 7. 



gular JxJ square matrix expressed as: 



0 



[Bqa. 231 



Assuming an ensemble of time series, the joint probability 
distribution functions f M ([Y)) and f (Jf] ([X]) are related by 



nection with FIGS. 6 and 7. 
Information Maximization v. Statistical Dependence 

The process of this invention relies on the unexpectedly 
advantageous observation that, under certain conditions, the 
53 maximizati on of the mutual information I(YX) operates to 
minimize the mutual information between separate outputs 
f U,} in a multiple source network, thereby performing the 
redundancy reduction required to solve the blind signal 
processing problem. The usefulness of this relationship was 
60 unsuspected until now. When limited to the usual logistic 
transfer or hyperbolic tangent sigmoid functions, this inven- 
tion appears to be limited to the general class of super- 
Gaussian signals having kurtosis greater than 3. This limi- 
tation can be understood by considering the following 
65 example shown in FIGS. 3A-3C 

Referring to FIG. 3 A, consider a network with two 
outputs y , and Y a , which may be either two output channels 
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from a blind source separation network or two signal 
samples at different times for a blind decon volution network. 
The joint entropy of these two variables can be written as: 



16 

-continued 
+lfbrft>0 \ 

0(brKj-0 J 

-lforfi<0 J 



Thus, the joint entropy can be maximized by maximizing 
the individual entropies while minimizing the mutual Infor- 
mation I(y lf y a ) shared between the two. When the mutual 
information ICy^y,) is zero, the two variables y, and y 2 are 
statistically independent and the joint probability density 
function is equal to the product of the individual probability 
density functions so mat ^(y^K^W^O^- ™c 
ICA and the "whitening * approach to deconvolution are 
examples of pair-wise minimization of mutual information 
ICy^a) for all pairs y, and y 2 . This process is variously 
denominated factorial code learning, predictability 
minimization, independent component analysis ICA and 
redundancy reduction. 

The process of this invention is a stochastic gradient 
ascent procedure that maximizes the joint entropy HQ^y^, 
thereby differing sharply from these 4l whitcning" and ICA 
procedures known for minimizing mutual information Hy,. 
ya). The system of this invention rests on the unexpectedly 
advantageous discovery of the general conditions under 
which maximizing joint entropy operates to reduce mutual 
information (redundancy), thereby reducing the statistical 
dependence of the two outputs y t and Y 2 . 

Under many conditions, maximizing joint entropy H(y,. 
ya) does not guarantee minimization of mutual information 
Ityi* v a) because of interference from the other single 
entropy terms H(y,) in Eon. 30. FIG. 3C shows one patho- 
logical example where a "diagonal" projection of two 
independent, unfformly^stributed variables x 1 and x 2 is 
preferred over the "independent" projection shown in FIG. 
3B when joint entropy is ronTimfrrri This occurs becau se of 
a ™i«matrh between the requisite alignment of input prob- 
ability distribution function and sigmoid slope discussed 
above In connection with FIGS. 1A-1C and Eqn. 8. The 
learning procedure of this invention achieves the higher 
value of mutual entropy shown in FIG. 3C than the desired 
value shown in FIG. 3B because of the higher individual 
output entropy values H(y<) arising from the triangular 
probability distribution functions of (Xj+Xa) and (x^x*) of 
FIG. 3C which more closely match the sigmoid slope (not 
shown). This interferes with the minimization of mutual 
information KyrfJ because the individual entropy H(y,) 
increases offset or mask undesired increases in mutual 
information to provide the higher joint entropy H(y lt yd 
sought by the process. 

The inventor believes that such interference has little 
significant effect in most practical situations, however. As 
mentioned above in connection with Eqn. 8, the sigmoidal 
function is not limited to the usual two functions and indeed 
can be tailored to the particular class of probability distri- 
bution functions expected by the process of this invention 
Any function that is a member of the class of solutions to the 
partial differential Eqn. 8 provides a sigmoldal function 
suitable for use with the process of mis invention. It can be 
shown that this general class of sigmoldal functions leads to 
the following two learning rules according to this Invention: 



and where parameter r is chosen appropriately for the 
presumed kurtosis of the probability distribution function of 
the source signals [S J. This formalism can be extended to 
covered skewed and multimodal input distribution by 
extending Eqn. 8 to produce an increasingly complex poly- 
nomial in g(x) such mat 

= Ou**)). 

15 

Even with the usual logistic transfer function (Eqn. 7) and 
the hyperbolic tangent function (r=2), it appears that the 
problem of individual entropy interference is limited to 
sub-Oaussian probability distribution functions having a 

20 kurtosis less man 3. Advantageously, many actual analog 
signals, including the speech signals used in the experimen- 
tal verification of the system of this invention, are super- 
Gaussian in distribution. They have longer tails and arc more 
sharply peaked than the Gaussian distribution, as may be 

25 appreciated with reference to the three distribution functions 
shown in FIGS. 8A-*C FIG. 8A shows a typical speech 
probability distribution function, FIG. SB shows the prob- 
ability distribution function for rock music and FIG. 8C 
shows a typical Gaussian white noise distribution. The 

30 inventor has found that joint entropy maximization for 
sigmoidal networks always minimizes the mutual informa- 
tion between the network outputs for all super-Gaussian 
signal distributions tested. Special sigmoid functions can be 
selected that are suitable for accomplishing the same result 

35 for sub-Gaussian signal distributions as well, although the 
precise learning rules must be selected in accordance with 
the parametric learning rules of Eqns. 31-32. 

Different sigmoid non-linearities provide different anti- 
Hebbian terms. Table 1 provides the anti-Hebbian terms 

40 from the learning rules resulting from several interesting 
non-linear transformation functions. The information- 
maximization rule consists of an anti-redundancy term 
which always has a form of W^] 7 ]" 1 and an anti-Hebbian 
term that keeps the unit from saturating. 

43 
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Slope: 


Ami Hebb torn; 


**S(«i) 






1 






Eqn. 8 
fiotubQO 


(i - iyiP) 
i-W 






t 






it- 








* * 



Table 1 shows that only the Eqo. 8 solutions (including 
the hyperbolic tangent function for r=2) and the logistic 
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transfer functions produce anti-Hcbbiw terms that can yield 

higher-order statistics. The other functions use the net input m~tivm-t\ * * W-C-'^w) 

ujas the output variable rather using the actual transformed A R A V M / 

output y. Tests performed by the inventor show that the erf ^ dgmoMal transfer function. 

function is unsuitable for blind separation. In fact, stable s J-J h ^bolic tangent function is selected as the sigmo.- 

wdght matrices using the -7*^ can becaloiUted from the " no ^earity, the foUowing training rules are used in the 

covariance matrix of the inputs alone ^Ihe learnin g nite: or ^ ™ t of this invention: 

a Gaussian radial basis function node is interesting because 

it contains u, in bom me numerator and denominator. The . v pp. 

ceoominator term limits the usefulness of such a rule 1Q T/ n f i - **' y ; ) 

because data points near the radial basis function center v u ' 

would cause instability. Radial transfer functions are gen- AW ^ B «.(- 2X^r y ) when < > i 351 

btaweifihts fwl> are updated regularly acceding to me ming elements exemplified by summing : drcatt 26. Plane 24 

£X*ru£ SugtoJZe and each of the contain, the leadwdghts for the 16 

sWcen >c«ling weights {WJ are updated regularly accord- formedby thenetwork. Prehndnary 

SSlag^of Eqn. ITdiscussed above. The* 25 by the inventor with speech signals ,n which ugnals :wm 

SSatea can ocoi ate evSy signal sample or may be simultaneously separated and deconvolved using the learn- 

ScumulTd £?maay signal Zpl«fe updating in . ing rule discussed above resulted in recovery of apparently 

global mode. Each of the weight elements in FIG. 5 exem- perfect speech. 

plified by element 18 Includes the logic necessary to produce Experimental Results iK ^™«„h 

aid accumulate the AW update according to the applicable » The inventor conducted eyerimeats »mg toe-second 

r°r~Jj7 segments of speech recorded from various speakers with 

Tte separation network in FIG. 5 can also be used to only one speaker per recording. All speech segments were 

remove bterfering signals from a receive signal merely by, sampled at 8DO0 Hz from the cutout of the awflury 

wr^Ule, Isolating^ interferer as output signal U, and microphone of a Sparc- 10 workstation. No specif post- 

thenlutaecting U, from the receive signal of interest, such 35 processing was performed on the waveforms other than the 

« receive signal X, In such a configuration, the network normalization of amplitudes tea common ^ervaU-33) to 

snowTin Fia 5 is herein denominated a -interference permit operation with the equipment used. The network was 

cancelling- network. trained using the stochastic gradient ascent procedure of this 

HO. 6 shows a functional block diagram illustrating a invention. 
simple causal filter operated according to the method of this 40 Unsupervised learning in a neural eework may proceed 
invention for blind decoovolution. A time-varying signal Is either continuously or in a global rnode. Connouous learning 
aesented to the network at input 22. The five spaced taps consists in slightly modifying the weights ate «^ propa- 
ne are separated by a time-delay interval i in the manner gation of an input vector through the network. This lona ot 
weu-SownTn the m for transversal filters. The five weight learning is useful for signals that arrive in real time or when 
factors (W,j are established and updated by internal logic 45 local storage capacity is restricted In a global 'earning 
SmS ac^rtog to the leanrfngVuTes shown In Eqns. mode, a multiplicity of samples are propagated through the 
^SSe The lrvewedtf.ted tap signal. {U( > network and the results Bored lc^yj^t.« a^room- 
« summed at a summation device 24 to produce me single puted exactly on these data and the wefeht. are modffled 
ume-varytog output signal U, Because input signal X, only ate accumulating and processing (he multiplicity of 
Includes an unknown non-linear combination of time- 50 signal samples. ... . 
Ke» v«ionVo?an unknown source signal S„ the syitem To reduce computational overhead, theae 
oM hU invention adjusts the tap weights <W,> such that were performed using the global learning mode To ensure 
outr^staal U, wroximates thTunknown source signal S r that the input ensemble is stationary in time, random points 

So 7 show4 Afunctional block diagram illustrating the were selected from the three-second window to generate the 

.onfcbtffe. of blind source sepamtiToetwork and blind 55 appropriate input vector.. Various learning rates were tested 

devolution filter systems ofthis invention. The blind with 0.005 preferred. As used heron, leamkgratee e«ab- 

«~ration learning rules and the blind deconvolution rules tishes the actual weight adjustment such that W^W^ 

dScussed abovecan be easily combined in the form exem- eAW<, as is known in the art The inventor found that 

plified by FIG. 7. The objective is to maximize the natural reducing the learning rate over the learning process was 

logarithm of a Jacoblan with local lower triangular structure. <so useful. „^.„_, r , 

which vields the expected learning rule that forces the Blind Separation Results: The network architecture 

SSng t^wSta ItoX to follow the blind shown in FIGS. 2A and 3 togetfier £th *c lear^gnile, in 

™* on rules and all others to foUow a decorr elation rule Eqns. 17-18 were found to be sufficient to perform blind 

aant that tapped weights {W w } are Interposed between a separation of at least seven unknown source rignals. A 

2d Sand a/outouL « random mixing matrix (Aj was generated with values uso- 

ThTtiutoots iV.) are used to produce a set of training ally in the interval [-l.lj/Tbe aUng matrix (AJ was used 

signals given by Eqn- 33: to generate the several mixed time series [X^ from the 
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original sources (SJ. The uumixlog matrix [W] and the bias 
vector [WJ were then trained according to the rules in Eqns. 
17-18. 

FIG. 1* shows the results of the attempted separation of 
five source signals. The mixtures [X,] formed an incompre- 
hensible babble that could not be penetrated by the human 
ear. The unmixed solutions shown as [Y t ] were obtained 
after presenting about 500.000 time samples, equivalent to 
20 passes through the complete three-second series. Any 
residual interference in the output vector elements (YJ 
inaudible to the human ear. This can be appreciated with 
reference to the permutation structure of the product of the 
final weight matrix [W) and the initial mixing matrix [A): 
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As can be seen, the residual interference factors are only 
a few percent of the single substantia] entry in each row and 
column, thereby demonstrating that weight matrix (W) 
substantially removes all effects of mixing matrix [A] from 
the signals. 

In a second experiment, seven source signals, including 
five speaking voices, a rock music selection and while noise, 
were successfully separated, although the separation was 
still slowly improving after IS million iterations, equivalent 
to 100 passes through the three-second data. For two 
sources, convergence is normally achieved in less man one 
pass through the three seconds of data by the system of this 
invention. 

The blind separation procedure of mis invention was 



The first whitening example shows what happens when 
"deconvolving" a speech signal that has not been corrupted 
(convolving filter [A) is a delta-function). If the tap spacing 
is dose enough, as In this case where the tap spacing is 
5 identical to the sample internal, the process of this invention 
learns the whitening filter shown in FIG. UC that flattens the 
amplitude spectrum of the speech up to the Nyquist limit 
(equivalent to half of the sampling frequency). FIG- 9A 
shows the spectrum of the speech sequence before decon- 
10 volution and FIG. 9B shows the speech spectrum after 
decon volution by the filter shown in FIG. UC Whitened 
speech sounds like a clear sharp version of the original 
signal because the phase structure is preserved. By using all 
available frequency levels equally, the system is maximizing 
15 information throughput in the channel Thus, when the 
original signal is not white, the deconvolving filter of this 
invention will recover a whitened version of it rather than 
the exact original. However, when the filter taps are spaced 
further apart, as in FIGS. llB-llL there is less or^ortunity 
20 for simple whitening. 

In the second "barrel-effect** example shown in FIG. 11E, 
a 6.25 ms echo is added to the speech signal This creates a 
mild audible barrel effect Because filter 11E is finite in 
length, its inverse is infinite in length but is shown in FIG. 
25 11 F as truncated. The Inverting filter learned in FIG. 110 
resembles FIG. 11F although the resemblance tails off 
toward the left side because the process of this invention 
actually learns an optimal filter of finite length instead of a 
truncated infinite optimal filter. The resulting deconvolution 
30 shown in FIG. 11H is very good. 

The best results from the blind deconvolution process of 
this invention are seen when the ideal deconvolving filter is 
of finite length, as in the third example shown in FIGS. 
U1-11L. FIG. 1U shows a set of exponentialry-dccaying 



L Gaussian white noise, and (b> when the mixing matrix (Al two^okt filter shown to HO. ™™^»J^ 

I \^Zri*Brti weaknesses arTontouindable «" "^J^?, f *™ c *°° 

l^.t^tn separate independent Gaussian ""J* ^ Z^J^JS,"™ 

^doninEon. 17 quite unstable In the vicinity of a ^C^«S^ 

tap-spacing is great enough (100 sample intervals) that 
simple whitening cannot interfere noticeably with the decon- 
45 volution process. 

Q tarty, other embodiments and modifications of this 



solution. 

In contrast with these results, experience with similar tests 
of the HJ network shows it occasionally fails to converge for 
two sources and rarely converges for three sources. 

Blind Deconvolution Results: Speech signals were con- 
volved with various filters and the learning rules in Eqns, 
26-27 were used to perform blind deconvolution. Some 
results are shown in FIGS. UA-11L. The convolving filter 
time domains shown in FIGS. 11A. 11E and 111, contained 
some zero values. For example, FIG. UE represents the 
filter 10.8,0,0,0,1 J. Moreover, the taps were sometimes adja- 
cent to each other, as in FIGS. 11A-UD, and sometimes 
spaced apart in time, as in FIGS. 11I-11L. The leading 



invention may occur readily to those of ordinary skill in the 
art in view of these teachings. Therefore, this invention Is to 
be limited only by the following claims, which include all 
such embodiments and modifications when viewed in con- 
junction with the above specification and accompanying 
drawing. 
I claim; 

1. A method performed in a neural network having input 



spaced apart in time, as in FHJ5. m-lii~ me leaning r ~ . ,,T ,~*V _ . . ^x* • . 

S oTeach filter is the right-most bar in each histogram, » "»«• recdvinga plurality 1 of mput s^s (^) and 

SSedby t«30 in naiU and bar 32 in HG. 110. output means for producing a plurality I of output signals 

TS SpeAnent is shown in FIGS. 11A-UD. a (U,) each said outpotaignal U, representing a ~^»*» ° 

a wjutmog cAfiwiAn*^ www . rt _ u said input signals (XJ weighted by a plurality I of bias 

barrel^fect «i^t ^n ^11^ £ d wdgfa^)^ such 
echo experiment in FIGS. 1U-11L. For each ot these three \Jw.XXto€W*& said method minimizing the 

experiments, the time domain characteristics of convolving » «at ^^WJoJ J™ .TLTZTn T ^ 

r rLn 1 w« ♦K/-* „f *u+ {deal decon- information redundancy among said output signals {Vj), 

filter (A) is shown followed by those of the ideal aecon- ^ ; < T ^i h^i^i ««. ;„t M ~« ™M mft tW 

vofving filter [W^J, those of the filter produced by the 
process of this invention [W] and the time domain pattern 
produced by convolution of jWj and [A]. Ideally, the con- 
volution [W1»[A) should be a delta-function consisting of 65 
only a single high value at the right-most position of the 
leading weight when [W] correctly inverts [A]. 



wherein 0<iSI>l and Ckj^J>l are integers, said method 
comprising: 

(a) selecting initial values for said bias weights ( W 0 ) and 
said scaling weights (W tf ); 

(b) producing a plurality I of training signals (Y<) respon- 
sive to a transformation of said input signals (X y ) such 
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that Y^g(U,), wherein g(x) Is a nonlinear wncfonaad 
the Jacobian of said transformaJion is J=det(dY/dX,) 
when J=l; and 
(c) adjusting said bias weights (WJ and said scaling 
weights (W u ) responsive to one or more samples of said 
training signals (YJ such that each said bias weight 
WL> is changed proportionately to a corresponding bias 
measure AW*, accumulated over said one or more 
samples and each said scaling weight Vfy is changed 
proportionately to a corresponding scaling measure 
AW</=e-d(InmydW tf accumulated over said one or 
more samples, wherein oOisa learning rate. 
2. The method of claim 1 wherein said nonlinear function 

g(x) is a n onlin ear function selected from a group consisting 

essentially of the solutions to me equation 



and said AW^, ^r'sgnOQ) accumulated ova said one » 
or more samples and each said scaling weight W„ is changed 
proportionately to a corresponding scaling measure oW^» 
e-CCcofOVvydettW^rX/jrr 1 «g*0W accumulated over 
said one or more samples. 

3. The method of claim 1 wherein said nonlinear function M 
gfx) is a nonlinear function selected from a group consisting 
essentially of gl (x>«tanh(x) and g^-eT 1 *** 
AWc selected from the group consisting essentially of 
AiW ^. ( _2Y < ) and A.W^-d^YJ accumulated over 
said one or more samples and each said scaling weight W y M 
is changed proportionately to the t corresponding scaling 
measure AW u selected from the group consisting essentially 
of A-W^^coiXW^ydetttW^^Y,) and A^^ctf 
(WyydeKW^^X/l^YJ) accumulated over said one or 
more samples. . " 

4. A neural-network implemented method for recovering 
one or more of a plurality I of independent source signals 
(S,) from a plurality J>I of sensor signals py each including 
a combination of at least some of said source signals (S t ) 



essentially of the solutions to the equation 



l-lsMT 



and said AW^^X/rr'sgnCY,)) accumulated over said 
one or more samples and each said scaling weight W y is 
changed proportionately to a corresponding scaling measure 
AW^cofCWyyaetCW^y-rX^lYr 1 « ccumu - 
latea over said one or more samples. 

6. The memod of claim 4 wherein said nonlinear function 
g(x) is a nonlinear function selected from a group consisting 
essentiaUy of g^xHanhCx) and ga(x>=<l-0 **** 
adjusting comprises: 

(c) adjusting said bias weights (W^) and said scaling 
weights (W\,) responsive to one or more samples of said 
training signals (Y<) such that each said bias weight 
is changed proportionately to a corresponding bias 
measure AW 0 selected from the group consisting 
essentially of AcW^^Y,) and A^Wa^-a^Y,) 
accumulated over said one or more samples and each 
said scaling weight W v is changed proportionately to 
the a corresponding scaling measure AW» selected 
from the group consisting essentially of A,w^»A<(cof 
(W tf )/det<W<,))-2X;Y,) and A 2 W^e.((cof(W v )/det 
(W^X/l-lY,)) accumulated over said one or more 
samples. 

7. A method implemented in a transversal filter having an 
input for receiving a sensor signal X that includes a com- 
bination of multipath reverberations of a source signal S and 
having a plurality I of delay line tap output signals (T,) 
distributed at intervals of one or more time delays t. said 
source signal S and said sensor signal X varying with time 
over a plurality J SI of said time delay intervals x such that 
said sensor signal X has a value X, at time x(H) 

said delay line tap output signal T, has a value X^ w 
representing said sensor signal value X, delayed by a time 
interval T(i-1), wherein t>0 is a predetermined constant and 
(X1^I>1 and 0<j£J£I arc integers, said method recovering 



a combination of at least some of said source ^P^W ^ ^ s from said sensor signal X and compris- 

wherein 0<KI>1 and (KjSM are integers, said method «, saw source signal a irom -a— 



comprising: 

(a) selecting a plurality I of bias weights (W») and a 
plurality I* of scaling weights <W tf ); 

(b) adjusting said bias weights (W^) and said scaling 
weights (W v ) by repeatedly performing the steps of: 
(b.l) producing a plurality I of estimation signals (UJ 

responsive to said sensor signals (X,) such mat 

(b2) producing a plurality I of training signals (YJ 
responsive to a transformation of said sensor signals 
0O such that Yf*gtVb wherein g(x) is a nonlinear 
function and the Jacobian of said transformation is 
J=det<aY / /oX) when J=I, and 

(b3) adjusting each said bias weight W tt and each said 
scaling weight W tf responsive to one or more 
samples of said training signals (YJ such mat said 
each bias weight W w is changed proportionately to a 
bias measure AW C accumulated over said one or 
more samples and said each scaling weight W y is 
changed proportionately to a corresponding scaling 
measure AW^6XlnUty3W tf accumulated over said 
one or more samples, wherein o0 is a learning rate; 
and 

(c) producing said estimation signals (U<) to represent said 
one or more recovered source signals (Sj). 

5. The method of claim 4 wherein said nonlinear function 
g(x) is a nonlinear function selected from a group consisting 



45 



(a) selecting a plurality I of filter weights (W,); 

(b) adjusting said filter weights (W,) by repeatedly per- 
forming the steps of 

(b. 1) producing a plurality K=I of weighted tap output 
signals (VJ by combining said delay line tap output 
signals (T<) such that (V^MF*) CT ( ), wherein 
(Xk£K=l>l are integers, and wherein F*r»W^.,_, 
when l£k+l-i£I and F*=0 otherwise, 

(b.2) summing a plurality K=I of said weighted tap 
signals (VJ to produce an estimation signal 
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65 



wherein said estimation signal U has a value U, at time 
tr>l), 

(bJ) producing a plurality J of training signals (Y y ) 
responsive to a tiansformatioo of said sensor signal 
values (X,) such thai Vf%(Vj> wherein g(x) is a 
nonlinear function and the Jacobian of said transfor- 
mation is Jt^t(3Y/oX) when J=L and 

(b.4) adjusting each said filter weight W, responsive to 
one or more samples of said training signals (Y y ) 
such that said each filter weight W, is changed 
proportionately to a corresponding leading measure 
AW (accumulated over said one or more samples 
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when i=l and t corresponding scaling measure AW,= 
ed(lnlJI)/dW, accumulated over said one or more 
samples otherwise; and 
(c) producing said estimation signal U lo represent said 

recovered source signal S. 
8. The method of claim 7 wherein said nonlinear function 
g(x) is a nonlinear function selected from a group consisting 
essentially of g t <x)*tanh(x) and g^l-eT 1 and said 
AWj selected from the group consisting essentially of 



accumulated over said one or more samples when t=l and a 
corresponding scaling measure AW, selected from the group 
consisting essentially of 



Ajft^c - . { t (-TX+iYj) tod AjW>»« ^ " 



accumulated over said one or more samples otherwise. 

9. The method of claim 7 wherein said nonlinear function 
g(x) is a nonlinear function selected from a group consisting 
essentially of the solutions to the equation 



sf ») - 1 - W«r "J a*i - 



accumulated over said one or more samples when i=l and a 
corresponding scaling measure 



accumulated over said one or more samples otherwise. 

19. A neural network for recovering a plurality of source 
signals from a plurality of mixtures of said source signals, 
said neural network comprising: 

input means for receiving a plurality J of input signals (X,) 
each including a combination of at least some of a 
plurality I of independent source signals (SJ, wherein 
0<I£I>1 and 0<j£J£I are integers; 

weight means coupled to said input means for storing a 
plurality I of bias weights (W«) and a plurality I 2 of 
scaling weights (W v ); 

output means coupled to said weight means for producing 
a plurality I of output signals (UJ responsive to said 
input signals (X,) such that (U,)=(W<,) <X,)4<W»); 

training means coupled to said output means for produc- 
ing a plurality 1 of training signals (Y|) responsive to a 
transformation of said input signals (X y ) such that 
Y>g(Ui). 

wherein g(x) is a nonlinear function and the Jacoblan of said 

transformation is Met(dY/dX;) when M; 
adjusting means coupled to said training means and said 
weight means for adjusting said bias weights (Wjq) and 
said scaling weights (W«) responsive to one or more 
samples of said training signals 0O such that each said 
bias weight W w is changed proportionately to a corre- 
sponding bias measure AW n accumulated over said 



24 



one or more samples and each said scaling weight W y 
is changed proportionately to a corresponding scaling 
measure AW^e-oXlnlJiydvVy accumulated over said 
one or more samples, wherein oO Is a learning rate. 
5 11. The neural network of claim 10 wherein said nonlinear 
function g(x) is a nonlinear function selected from a group 
consisting essentially of the solutions to the equation 



.att-l-HWr 



to 



and said bias measure AW^^-^jT'irfY*)) and said 
scaling measure AW^^wrfCW^ydet^^HJyY^ 1 *g° 
W). 

VL The neural network of claim 10 wherein said nonlin- 
15 ear function g(x) is a nonlinear function selected from a 
group consisting essentially of g,(x>=tanh(x) and gaCxHl- 
e"T 1 and said bias measure AW W is selected from a group 
consisting essentially of A, Wgp-ZY, and A^^l-^Y, and 
said scaling measure AW y is selected from a group consist- 
20 ing essentially of AW^^cofCW^dct^^X^Y, and 
AjW^cofCW^/detCW^HX/WYj 

13. A system for adaptively cancelling one or more 
interferer signals (SJ comprising: 

input means for receiving a plurality J of input signals (X,) 
each including a combination of at least some of a 
plurality I of independent source signals (Sj) thai 
includes said one or more interferer signals (SJ, 
wherein <Ki£I>l. 0<jSJSI and (KnSNgl arc inte- 

30 gOT: 

weight means coupled to said input means for storing a 
plurality I of bias weights (W^) and a plurality I 2 of 
scaling weights (W v ); 
output means coupled to said weight means for producing 

35 a plurality t of output signals (U,) responsive to said 
input signals (X,) such that (l^WWy) (X^KWJ; 
training means coupled to said output means for produc- 
ing a plurality I of training signals (YJ responsive to a 
transformation of said input signals (X,) such thai 

40 Ypg(U,), wherein g(x) is a nonlinear function and the 
Jacobian of said transformation is J«det(dY/dXj); 
adjusting means coupled to said training means and said 
weight means for adjusting said bias weights (W») and 
said scaling weights (W„) responsive to one or more 

45 samples of said training signals (Yj) such that each said 
bias weight is changed proportionately to a corre- 
sponding bias measure AW„ accumulated ova said 
one or more samples and each said scaling weight W v 
is changed proportionately to a corresponding scaling 

50 measure AW^e cXlnJiydW^ accumulated over said 
one or more samples, wherein 00 is a learning rate; 
and 

feedback means coupled to said output means and said 
input means for selecting one or more said output 
55 signals (UJ representing said one or more interferer 
signals (SJ for combination with said input signals 
(XjU thereby cancelling said interferer signals (SJ. 

14. The system of claim 13 wherein said nonlinear 
function g(x) is a nonlinear function selected from a group 

a consisting essentially of the solutions to the equation 

65 and said bias measure AW^-C-rl Y<T 1 s gn(Y,)) and said 
scaling measure AW^<(cof(W </ ydet(W v )>-rX / IY < ^^ , sgn 
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5,706,402 

25 26 
15 The system of data 13 whtrdn said nonlinear said scaling measw*AW v is jetted ftomj i IgV"*"* 
Jir^xhTaonlincar function adec^ from a group «VSW^%3!^ ^ ^ 
consisting essentially of g,(x)=taiih(x) and g,(xKl-o^' ^W^eofCW^ydeKW^HX^l-ZY,)- 
and said bins measure AW„ is selected from a group 
consisting essentially of AjWjo^Y, and A,Wa=l-2y, and 
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WHAT IS CLAIMED IS: 

1 . A medical system for separating electrocardiogram (EKG) signals, comprising: 

a receiving module configured to receive a plurality J of recorded EKG signals Xj 
from a plurality of EKG sensors; 
5 a computing module configured to separate the received signals using independent 

component analysis to produce a plurality I of separated signals Y\\ and 
a display module configured to display the separated signals. 

2. The medical system of claim 1, wherein the display module is further configured to display 
at least a portion of the separated signals in a chaos phase space portrait. 

10 3. The medical system of claim 2, wherein the separated signals include three components of 
QRS complex, and wherein the display module is further configured to display at least the three 
QRS complex components in a chaos phase space portrait. 

4. The medical system of claim 1, wherein the computing module is configured to separate the 
recorded signals by multiplying the recorded signals by a matrix Wjj such that Y { = W { j * Xj. 
15 5. The medical system of claim 1, wherein the computing module is configured to separate the 
recorded signals using a neural-network implemented method, the method comprising: 

selecting a plurality I of bias weights W i0 and a plurality I* J of scaling weights Wjj; 
adjusting the bias weights W i0 and the scaling weights W {j to minimize information 
redundancy among separated signals; and 
20 producing separated signals Y { such that Yj = Wij * Xj + W io . 

6. The medical system of claim 1, further comprising a database storing a plurality of EKG 
signal triggers and corresponding diagnosis, and a matching module configured to match the 
separated signals with one or more of the stored EKG signal triggers. 

7. A computer-implemented method of separating electrocardiogram (EKG) recording signals, 
25 the method comprising: 

receiving a first plurality of EKG recording signals from EKG sensors placed on a 

patient; 

separating the first plurality of EKG recording signals using independent 
component analysis to produce a second plurality of separated signals; and 
30 displaying the separated signals. 

8. The method of claim 7, further comprising displaying at least a portion of the separated 
signals in a chaos phase space portrait. 

9. The method of claim 7, wherein the patient is a pregnant patient, and wherein the separated 
signals include separated signals originating from the pregnant patient and separated signals 

35 originating from a fetus. 

10. The method of claim 7, wherein the displayed separated signals are used by a physician to 
determine the likelihood of arrhythmia in the patient. 
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1 1 . The method of claim 7, wherein the displayed separated signals are used by a physician to 
determine the likelihood of myocardial infarction in the patient. 

12. The method of claim 7, wherein each of the separated signals corresponds to a location on 
the patient body, wherein the displayed separated signals are used by a physician to determine the 

5 location of an abnormal heart condition in the patient according to the separated signals' 
corresponding locations. 

13. A computer-assisted method of detecting arrhythmia in a patient, the method comprising: 

placing a first plurality of EKG sensors on a patient to produce a first plurality of 
channels of recorded EKG signals; 
10 sending the recorded signals to a computing module to separate the first plurality of 

EKG recorded signals into a first plurality of channels of separated signals using 
independent component analysis; and 

reviewing a display of the separated signals to determine the existence of 
arrhythmia in the patient. 

15 14. The method of claim 13, wherein reviewing a display of the separated signals comprises 
identifying a second set of one or more channels of separated signals that indicate arrhythmia, the 
method further comprising determining a probable location of arrhythmia according to the 
respective channel numbers of the second set of separated signals. 

15. The method of claim 1 3, wherein placing a first plurality of EKG sensors comprises placing 
20 a plurality of EKG sensors on more than 10 body surface locations of a patient's torso. 

1 6. The method of claim 13, wherein placing a first plurality of EKG sensors comprises placing 
a plurality of EKG sensors on more than 40 body surface locations of a patient's torso. 

1 7. A cardiac rhythm management system comprising: 

a cardiac signal recording module configured to record cardiac signals of a patient; 
25 a computing module configured to separate the recorded cardiac signals into 

separated signals using independent component analysis; 

a detection module configured to detect or predict an abnormal condition based on 
analyzing the separated cardiac signals; and 

a treatment module configured to treat the patient when the abnormal condition is 
30 detected or predicted. 

18. The cardiac rhythm management system of claim 17, wherein the detection module is 
configured to compare the separated signals with a plurality of stored triggers to determine whether 
the separated signals match a stored trigger. 

1 9. A cardiac rhythm management system comprising: 

35 a cardiac signal recording module configured to record cardiac signals of a patient; 

a computing module configured to separate the recorded cardiac signals into 
separated signals using independent component analysis; 
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a detection module configured to detect or predict an abnormal condition based on 
analyzing the separated cardiac signals; and 

a warning module configured to issue a warning when the abnormal condition is 
detected or predicted. 

5 20, The cardiac rhythm management system of claim 19, wherein the detection module is 
configured to compare the separated signals with a plurality of stored triggers to determine whether 
the separated signals match a stored trigger. 
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